Multiple concurrent spark notebooks

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple concurrent spark notebooks

Mark Libucha
Hi everyone,

I've got Zeppelin running against a Cloudera/Yarn/Spark cluster and everything seems to be working fine. Very cool.

One minor issue, though. When one notebook is running, others queue up behind it. Is there a way to run multiple notebooks concurrently? Both notebooks are running the pyspark interpreter.

Thanks,

Mark

Reply | Threaded
Open this post in threaded view
|

Re: Multiple concurrent spark notebooks

Mohit Jaggi
change your spark settings so that the REPL does not get the whole cluster. e.g. by reducing the executor memory and cpu allocation.

Mohit Jaggi
Founder,
Data Orchard LLC
www.dataorchardllc.com




> On Oct 5, 2016, at 11:02 AM, Mark Libucha <[hidden email]> wrote:
>
> Hi everyone,
>
> I've got Zeppelin running against a Cloudera/Yarn/Spark cluster and everything seems to be working fine. Very cool.
>
> One minor issue, though. When one notebook is running, others queue up behind it. Is there a way to run multiple notebooks concurrently? Both notebooks are running the pyspark interpreter.
>
> Thanks,
>
> Mark
>

Reply | Threaded
Open this post in threaded view
|

Re: Multiple concurrent spark notebooks

Mich Talebzadeh
Hi Mark,

Zeppelin on Park uses Spark interpreter

Edit the interpreter. By default Zeppelin uses local mode as seen below

Inline images 1

You can of course change that to standalone mode by specifying

master spark://<IP_ADDRESS>:7077

and increase cores.max and spark.executor.memory as shown above.

HTH



Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com


Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction.

 


On 5 October 2016 at 19:20, Mohit Jaggi <[hidden email]> wrote:
change your spark settings so that the REPL does not get the whole cluster. e.g. by reducing the executor memory and cpu allocation.

Mohit Jaggi
Founder,
Data Orchard LLC
www.dataorchardllc.com




> On Oct 5, 2016, at 11:02 AM, Mark Libucha <[hidden email]> wrote:
>
> Hi everyone,
>
> I've got Zeppelin running against a Cloudera/Yarn/Spark cluster and everything seems to be working fine. Very cool.
>
> One minor issue, though. When one notebook is running, others queue up behind it. Is there a way to run multiple notebooks concurrently? Both notebooks are running the pyspark interpreter.
>
> Thanks,
>
> Mark
>


Reply | Threaded
Open this post in threaded view
|

Re: Multiple concurrent spark notebooks

Mark Libucha
Mich, thanks for the suggestion. I tried your settings, but they did not solve the problem.

I'm running in yarn-client mode, not local or standalone, so the resources in the Spark cluster (which is very large) should not be an issue. Right?

The problem seems to be that Zeppelin is not submitting the 2nd job to the Spark cluster.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple concurrent spark notebooks

Andreas Lang
Hi Mark,

you may want to check the spark interpreter settings. In the most recent version of zeppelin you can set it to shared, isolated or scoped.

Shared: single interpreter and spark context (and the queuing you see)
Isolated: every notebook has its own interpreter and spark context 
Scoped: every notebook has its own interpreter but they share a spark context 
https://zeppelin.apache.org/docs/latest/interpreter/spark.html

Isolated is the most stable for what you want to do and shared the more resource efficient for the machine you run zeppelin on.

The comment of Mohit might be important if you have spark.dynamicAllocation.enabled set to true and no limits on the number and resources of executors.

Andreas

On Thu, 6 Oct 2016 at 16:28 Mark Libucha <[hidden email]> wrote:
Mich, thanks for the suggestion. I tried your settings, but they did not solve the problem.

I'm running in yarn-client mode, not local or standalone, so the resources in the Spark cluster (which is very large) should not be an issue. Right?

The problem seems to be that Zeppelin is not submitting the 2nd job to the Spark cluster.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple concurrent spark notebooks

Mark Libucha
That was it! Thanks so much Andreas. Can't believe I had overlooked that drop down in the interpreter settings. Mohit and Mich probably assumed I had tried that already.

Thanks everyone.

Mark

On Thu, Oct 6, 2016 at 8:35 AM, Andreas Lang <[hidden email]> wrote:
Hi Mark,

you may want to check the spark interpreter settings. In the most recent version of zeppelin you can set it to shared, isolated or scoped.

Shared: single interpreter and spark context (and the queuing you see)
Isolated: every notebook has its own interpreter and spark context 
Scoped: every notebook has its own interpreter but they share a spark context 
https://zeppelin.apache.org/docs/latest/interpreter/spark.html

Isolated is the most stable for what you want to do and shared the more resource efficient for the machine you run zeppelin on.

The comment of Mohit might be important if you have spark.dynamicAllocation.enabled set to true and no limits on the number and resources of executors.

Andreas

On Thu, 6 Oct 2016 at 16:28 Mark Libucha <[hidden email]> wrote:
Mich, thanks for the suggestion. I tried your settings, but they did not solve the problem.

I'm running in yarn-client mode, not local or standalone, so the resources in the Spark cluster (which is very large) should not be an issue. Right?

The problem seems to be that Zeppelin is not submitting the 2nd job to the Spark cluster.