Zeppelin architecture

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Zeppelin architecture

York Huang
Hi,

I am new to Zeppelin and have a few questions.
1. Should I install Zeppelin on a Hadoop edge node and every users access from browser? Or should every users have to install their own Zeppelin ?

2. How do I run standard Python without using spark?

3. Can I install Zeppelin on Windows server?

4. Is it possible to share data between interpreters ?

Thanks

York

Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin architecture

moon
Administrator
Hi York,

Thanks for the question.

1. How you install zeppelin is up to you and your use case. You can either run single instances of Zeppelin and configure authentication and let many user login, or let each user run their own Zeppelin instance.
I see both use cases from users, and it really depends on your environment.

2. From 0.6.0 release, Zeppelin ships python interpreter. You can try %python.

3. You can run Zeppelin on windows by running bin/zeppelin.cmd

4. Interpreter can share data through resource pool. You can think resource pool as a distributed map across all interpreters. Although every interpreter can access the resource pool, few interpreters expose API to user and let user directly access the resource pool.

SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters that expose resource pool API to user. You can access resource pool via z.get(), z.put() api. Check [1].


Thanks,
moon


On Sat, Sep 3, 2016 at 6:45 PM York Huang <[hidden email]> wrote:
Hi,

I am new to Zeppelin and have a few questions.
1. Should I install Zeppelin on a Hadoop edge node and every users access from browser? Or should every users have to install their own Zeppelin ?

2. How do I run standard Python without using spark?

3. Can I install Zeppelin on Windows server?

4. Is it possible to share data between interpreters ?

Thanks

York

Sent from my iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin architecture

York Huang
Hi Moon,

Thanks for your response.

I have a MapR 4.1 cluster and would like to use zeppelin on it. If I install zeppelin on an edge node, what security should I set up? The online document is a bit confusing. Basically, I want to set up every users have their own account (either AD or newly created zeppelin account).

Is there any guide?

Thanks,

York

On 5 September 2016 at 07:31, moon soo Lee <[hidden email]> wrote:
Hi York,

Thanks for the question.

1. How you install zeppelin is up to you and your use case. You can either run single instances of Zeppelin and configure authentication and let many user login, or let each user run their own Zeppelin instance.
I see both use cases from users, and it really depends on your environment.

2. From 0.6.0 release, Zeppelin ships python interpreter. You can try %python.

3. You can run Zeppelin on windows by running bin/zeppelin.cmd

4. Interpreter can share data through resource pool. You can think resource pool as a distributed map across all interpreters. Although every interpreter can access the resource pool, few interpreters expose API to user and let user directly access the resource pool.

SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters that expose resource pool API to user. You can access resource pool via z.get(), z.put() api. Check [1].


Thanks,
moon


On Sat, Sep 3, 2016 at 6:45 PM York Huang <[hidden email]> wrote:
Hi,

I am new to Zeppelin and have a few questions.
1. Should I install Zeppelin on a Hadoop edge node and every users access from browser? Or should every users have to install their own Zeppelin ?

2. How do I run standard Python without using spark?

3. Can I install Zeppelin on Windows server?

4. Is it possible to share data between interpreters ?

Thanks

York

Sent from my iPhone

Reply | Threaded
Open this post in threaded view
|

Running R on Zeppelin EMR Cluster

Mark Mikolajczak - 07855 306 064

Hi All,

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"

The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0

Within the interpreter there is no mention of R so figure I am missing something but do not know what.

Any pointers greatly appreciated.

Reply | Threaded
Open this post in threaded view
|

Re: Running R on Zeppelin EMR Cluster

Hyung Sung Shim
Hi.
Unfortunately Zeppelin 0.5.6 does not support R interpreter.
Could you upgrade your Zeppelin to higher version? 

2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <[hidden email]>:

Hi All,

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"

The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0

Within the interpreter there is no mention of R so figure I am missing something but do not know what.

Any pointers greatly appreciated.


Reply | Threaded
Open this post in threaded view
|

Re: Running R on Zeppelin EMR Cluster

Hyung Sung Shim
and EMR-5.0.0 supports Zeppelin 0.6.1.


2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <[hidden email]>:
Hi.
Unfortunately Zeppelin 0.5.6 does not support R interpreter.
Could you upgrade your Zeppelin to higher version? 

2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <[hidden email]>:

Hi All,

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"

The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0

Within the interpreter there is no mention of R so figure I am missing something but do not know what.

Any pointers greatly appreciated.



Reply | Threaded
Open this post in threaded view
|

Re: Running R on Zeppelin EMR Cluster

Mark Mikolajczak - 07855 306 064
Thanks I was afraid that was the solution.

I am connecting to a Couchbase database and the connector only supports Spark 1.6 and by upgrading I will be using EMR-5.0.0 which seems  to only run Spark 2.0…
 

On 6 Sep 2016, at 16:27, Hyung Sung Shim <[hidden email]> wrote:

and EMR-5.0.0 supports Zeppelin 0.6.1.


2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <[hidden email]>:
Hi.
Unfortunately Zeppelin 0.5.6 does not support R interpreter.
Could you upgrade your Zeppelin to higher version? 

2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <[hidden email]>:

Hi All,

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"

The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0

Within the interpreter there is no mention of R so figure I am missing something but do not know what.

Any pointers greatly appreciated.




Reply | Threaded
Open this post in threaded view
|

Re: Running R on Zeppelin EMR Cluster

Jonathan
Mark,

I see in the couchbase-spark-connector Github project that they have already upgraded to Spark 2.0 (https://github.com/couchbase/couchbase-spark-connector/pull/9) but that this change has not yet been released into a new version. According to the discussion on that pull request, it sounds like they are hoping for a new version this month.

As for using the R interpreter on emr-5.0.0, unfortunately EMR does not yet (officially) support the R interpreter. I expect that we (I'm from EMR, btw) would be able to support it eventually, but I'm unable to give any ETA on that.

~ Jonathan

On Tue, Sep 6, 2016 at 8:34 AM Mark Mikolajczak - 07855 306 064 <[hidden email]> wrote:
Thanks I was afraid that was the solution.

I am connecting to a Couchbase database and the connector only supports Spark 1.6 and by upgrading I will be using EMR-5.0.0 which seems  to only run Spark 2.0…
 

On 6 Sep 2016, at 16:27, Hyung Sung Shim <[hidden email]> wrote:

and EMR-5.0.0 supports Zeppelin 0.6.1.


2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <[hidden email]>:
Hi.
Unfortunately Zeppelin 0.5.6 does not support R interpreter.
Could you upgrade your Zeppelin to higher version? 

2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <[hidden email]>:

Hi All,

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"

The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0

Within the interpreter there is no mention of R so figure I am missing something but do not know what.

Any pointers greatly appreciated.




Reply | Threaded
Open this post in threaded view
|

Re: Running R on Zeppelin EMR Cluster

Mark Mikolajczak - 07855 306 064
Thanks for sharing. 

That is disappointing that R is not available on EMR. I will look out for updates. 

Regards,
Mark



On 6 Sep 2016, at 17:42, Jonathan Kelly <[hidden email]> wrote:

Mark,

I see in the couchbase-spark-connector Github project that they have already upgraded to Spark 2.0 (https://github.com/couchbase/couchbase-spark-connector/pull/9) but that this change has not yet been released into a new version. According to the discussion on that pull request, it sounds like they are hoping for a new version this month.

As for using the R interpreter on emr-5.0.0, unfortunately EMR does not yet (officially) support the R interpreter. I expect that we (I'm from EMR, btw) would be able to support it eventually, but I'm unable to give any ETA on that.

~ Jonathan

On Tue, Sep 6, 2016 at 8:34 AM Mark Mikolajczak - 07855 306 064 <[hidden email]> wrote:
Thanks I was afraid that was the solution.

I am connecting to a Couchbase database and the connector only supports Spark 1.6 and by upgrading I will be using EMR-5.0.0 which seems  to only run Spark 2.0…
 

On 6 Sep 2016, at 16:27, Hyung Sung Shim <[hidden email]> wrote:

and EMR-5.0.0 supports Zeppelin 0.6.1.


2016-09-07 0:24 GMT+09:00 Hyung Sung Shim <[hidden email]>:
Hi.
Unfortunately Zeppelin 0.5.6 does not support R interpreter.
Could you upgrade your Zeppelin to higher version? 

2016-09-06 23:53 GMT+09:00 Mark Mikolajczak - 07855 306 064 <[hidden email]>:

Hi All,

I am trying to setup the R interpreter to run in Zeppelin which is currently running on EMR. Zeppelin is working perfectly and I am able to write script in Scala and Python. When I use %r, %sparkR or %knitr I receive an error : "r interpreter not found"

The applications which I have running in my emr-4.7.2 cluster are: Hive 1.0.0, Zeppelin-Sandbox 0.5.6, Spark 1.6.2, Pig 0.14.0

Within the interpreter there is no mention of R so figure I am missing something but do not know what.

Any pointers greatly appreciated.




Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin architecture

York Huang
In reply to this post by York Huang
Hi Moon,

More questions.

If I set up the MapR cluster in secure mode, how do I set up zeppelin?

Thanks,

York

On 6 September 2016 at 17:16, York Huang <[hidden email]> wrote:
Hi Moon,

Thanks for your response.

I have a MapR 4.1 cluster and would like to use zeppelin on it. If I install zeppelin on an edge node, what security should I set up? The online document is a bit confusing. Basically, I want to set up every users have their own account (either AD or newly created zeppelin account).

Is there any guide?

Thanks,

York

On 5 September 2016 at 07:31, moon soo Lee <[hidden email]> wrote:
Hi York,

Thanks for the question.

1. How you install zeppelin is up to you and your use case. You can either run single instances of Zeppelin and configure authentication and let many user login, or let each user run their own Zeppelin instance.
I see both use cases from users, and it really depends on your environment.

2. From 0.6.0 release, Zeppelin ships python interpreter. You can try %python.

3. You can run Zeppelin on windows by running bin/zeppelin.cmd

4. Interpreter can share data through resource pool. You can think resource pool as a distributed map across all interpreters. Although every interpreter can access the resource pool, few interpreters expose API to user and let user directly access the resource pool.

SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters that expose resource pool API to user. You can access resource pool via z.get(), z.put() api. Check [1].


Thanks,
moon


On Sat, Sep 3, 2016 at 6:45 PM York Huang <[hidden email]> wrote:
Hi,

I am new to Zeppelin and have a few questions.
1. Should I install Zeppelin on a Hadoop edge node and every users access from browser? Or should every users have to install their own Zeppelin ?

2. How do I run standard Python without using spark?

3. Can I install Zeppelin on Windows server?

4. Is it possible to share data between interpreters ?

Thanks

York

Sent from my iPhone


Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin architecture

York Huang
Hi Moon,

Sorry, a few more questions.

My cluster is a mapr cluster.

If I want to install zeppelin on one edge node and multiple users access that zeppelin, how do I set up multiple users to run jobs and access data in MapR cluster using their own accounts?

If I want to install zeppelin on every users' desktop and let them to access MapR from their own desktops, how do I install zeppelin on their windows desktops?

Is there any guide somewhere?

Thanks,

York

On 7 September 2016 at 10:06, York Huang <[hidden email]> wrote:
Hi Moon,

More questions.

If I set up the MapR cluster in secure mode, how do I set up zeppelin?

Thanks,

York

On 6 September 2016 at 17:16, York Huang <[hidden email]> wrote:
Hi Moon,

Thanks for your response.

I have a MapR 4.1 cluster and would like to use zeppelin on it. If I install zeppelin on an edge node, what security should I set up? The online document is a bit confusing. Basically, I want to set up every users have their own account (either AD or newly created zeppelin account).

Is there any guide?

Thanks,

York

On 5 September 2016 at 07:31, moon soo Lee <[hidden email]> wrote:
Hi York,

Thanks for the question.

1. How you install zeppelin is up to you and your use case. You can either run single instances of Zeppelin and configure authentication and let many user login, or let each user run their own Zeppelin instance.
I see both use cases from users, and it really depends on your environment.

2. From 0.6.0 release, Zeppelin ships python interpreter. You can try %python.

3. You can run Zeppelin on windows by running bin/zeppelin.cmd

4. Interpreter can share data through resource pool. You can think resource pool as a distributed map across all interpreters. Although every interpreter can access the resource pool, few interpreters expose API to user and let user directly access the resource pool.

SparkInterpreter, PysparkInterpreter, SparkRInterpreter are interpreters that expose resource pool API to user. You can access resource pool via z.get(), z.put() api. Check [1].


Thanks,
moon


On Sat, Sep 3, 2016 at 6:45 PM York Huang <[hidden email]> wrote:
Hi,

I am new to Zeppelin and have a few questions.
1. Should I install Zeppelin on a Hadoop edge node and every users access from browser? Or should every users have to install their own Zeppelin ?

2. How do I run standard Python without using spark?

3. Can I install Zeppelin on Windows server?

4. Is it possible to share data between interpreters ?

Thanks

York

Sent from my iPhone