Monitoring a Notebook in Spark UI

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Monitoring a Notebook in Spark UI

Joshua Conlin
Hello,

I'm looking for documentation to better understand pyspark/scala notebook execution in Spark.  I typically see application runtimes that can be very long, is there always a spark "application" running for a notebook or zeppelin session?  Those that are not actually being run in zeppelin typically have very low resource utilization.  Are these applications in spark tied to the zeppelin user's session?

Also, how can I find out more about hive, pyspark and scala interpreter concurrency?  How many users/notebooks/paragraphs can execute these interpreters concurrently and how is this tunable?

Any insight you can provide would be appreciated.

Thanks,

Josh
Reply | Threaded
Open this post in threaded view
|

Re: Monitoring a Notebook in Spark UI

Jeff Zhang
Regarding how many spark apps, it depends on the interpreter binding mode, you can refer to this document. http://zeppelin.apache.org/docs/0.9.0-preview1/usage/interpreter/interpreter_binding_mode.html
Internally, each spark app run a scala shell to execute scala code and python shell to execute pyspark code.

Regarding the interpreter concurrency,  it depends on how you define interpreter concurrency, you can run each spark app for each user or each note, that depends on the interpreter binding mode I refer above. You can also run multiple spark sql jobs concurrently in one spark app

Joshua Conlin <[hidden email]> 于2020年7月21日周二 下午11:00写道:
Hello,

I'm looking for documentation to better understand pyspark/scala notebook execution in Spark.  I typically see application runtimes that can be very long, is there always a spark "application" running for a notebook or zeppelin session?  Those that are not actually being run in zeppelin typically have very low resource utilization.  Are these applications in spark tied to the zeppelin user's session?

Also, how can I find out more about hive, pyspark and scala interpreter concurrency?  How many users/notebooks/paragraphs can execute these interpreters concurrently and how is this tunable?

Any insight you can provide would be appreciated.

Thanks,

Josh


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

RE: Monitoring a Notebook in Spark UI

stephane.davy

Hi Jeff,

 

You can also run multiple spark sql jobs concurrently in one spark app

 

Can you please elaborate on this? What I see (with Zeppelin 0.8) is that with shared interpreter, each job is ran one after one. When going to one interpreter per user, many users can run a job at the same time, but each user can run only one job at one time. How is it possible to run multiple sql jobs concurrently in one spark app?

 

Thanks,

 

Stéphane

 

 

From: Jeff Zhang [mailto:[hidden email]]
Sent: Tuesday, July 21, 2020 17:54
To: users
Subject: Re: Monitoring a Notebook in Spark UI

 

Regarding how many spark apps, it depends on the interpreter binding mode, you can refer to this document. http://zeppelin.apache.org/docs/0.9.0-preview1/usage/interpreter/interpreter_binding_mode.html

Internally, each spark app run a scala shell to execute scala code and python shell to execute pyspark code.

 

Regarding the interpreter concurrency,  it depends on how you define interpreter concurrency, you can run each spark app for each user or each note, that depends on the interpreter binding mode I refer above. You can also run multiple spark sql jobs concurrently in one spark app

 

Joshua Conlin <[hidden email]> 2020721日周二 下午11:00写道:

Hello,

 

I'm looking for documentation to better understand pyspark/scala notebook execution in Spark.  I typically see application runtimes that can be very long, is there always a spark "application" running for a notebook or zeppelin session?  Those that are not actually being run in zeppelin typically have very low resource utilization.  Are these applications in spark tied to the zeppelin user's session?

 

Also, how can I find out more about hive, pyspark and scala interpreter concurrency?  How many users/notebooks/paragraphs can execute these interpreters concurrently and how is this tunable?

 

Any insight you can provide would be appreciated.

 

Thanks,

 

Josh


 

--

Best Regards

Jeff Zhang


smime.p7s (8K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Monitoring a Notebook in Spark UI

Jeff Zhang
Hi Stephane,

I mean running spark sql job concurrently via %spark.sql just by setting zeppelin.spark.concurrentSQL to be true.

See the details here

<[hidden email]> 于2020年7月22日周三 上午12:21写道:

Hi Jeff,

 

You can also run multiple spark sql jobs concurrently in one spark app

 

Can you please elaborate on this? What I see (with Zeppelin 0.8) is that with shared interpreter, each job is ran one after one. When going to one interpreter per user, many users can run a job at the same time, but each user can run only one job at one time. How is it possible to run multiple sql jobs concurrently in one spark app?

 

Thanks,

 

Stéphane

 

 

From: Jeff Zhang [mailto:[hidden email]]
Sent: Tuesday, July 21, 2020 17:54
To: users
Subject: Re: Monitoring a Notebook in Spark UI

 

Regarding how many spark apps, it depends on the interpreter binding mode, you can refer to this document. http://zeppelin.apache.org/docs/0.9.0-preview1/usage/interpreter/interpreter_binding_mode.html

Internally, each spark app run a scala shell to execute scala code and python shell to execute pyspark code.

 

Regarding the interpreter concurrency,  it depends on how you define interpreter concurrency, you can run each spark app for each user or each note, that depends on the interpreter binding mode I refer above. You can also run multiple spark sql jobs concurrently in one spark app

 

Joshua Conlin <[hidden email]> 2020721日周二 下午11:00写道:

Hello,

 

I'm looking for documentation to better understand pyspark/scala notebook execution in Spark.  I typically see application runtimes that can be very long, is there always a spark "application" running for a notebook or zeppelin session?  Those that are not actually being run in zeppelin typically have very low resource utilization.  Are these applications in spark tied to the zeppelin user's session?

 

Also, how can I find out more about hive, pyspark and scala interpreter concurrency?  How many users/notebooks/paragraphs can execute these interpreters concurrently and how is this tunable?

 

Any insight you can provide would be appreciated.

 

Thanks,

 

Josh


 

--

Best Regards

Jeff Zhang



--
Best Regards

Jeff Zhang