Other paragraphs do not wait for %sh paragraphs to finish.

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Other paragraphs do not wait for %sh paragraphs to finish.

murexconsultant
I often have notebooks that have a %sh as the 1st paragraph. This scps some file from another server, and then a number of spark or sparksql paragraphs are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too at the same time. The others go into pending state and then start once the spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Other paragraphs do not wait for %sh paragraphs to finish.

moon
Administrator
Hi,

That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and run-all just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
Do you mind file a JIRA issue?

Thanks,
moon

On Thu, Apr 6, 2017 at 5:35 AM <[hidden email]> wrote:
I often have notebooks that have a %sh as the 1st paragraph. This scps some file from another server, and then a number of spark or sparksql paragraphs are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too at the same time. The others go into pending state and then start once the spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Rick Moritz
This actually calls for a dependency definition of notes within a notebook, so the scheduler can decide which tasks to run simultaneously.
I suggest a simple counter of dependency levels, which by default increases with every new note and can be decremented to allow notes to run simultaneously. Run-all then submits each level into the target interpreters for this level, awaits termination of all results, and then starts the next level's note.


On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <[hidden email]> wrote:
Hi,

That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and run-all just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
Do you mind file a JIRA issue?

Thanks,
moon

On Thu, Apr 6, 2017 at 5:35 AM <[hidden email]> wrote:
I often have notebooks that have a %sh as the 1st paragraph. This scps some file from another server, and then a number of spark or sparksql paragraphs are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too at the same time. The others go into pending state and then start once the spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Jeff Zhang

That's correct, it needs define dependency between paragraphs, e.g.  %spark(deps=p1), so that we can build DAG for the whole pipeline.





Rick Moritz <[hidden email]>于2017年4月6日周四 下午3:28写道:
This actually calls for a dependency definition of notes within a notebook, so the scheduler can decide which tasks to run simultaneously.
I suggest a simple counter of dependency levels, which by default increases with every new note and can be decremented to allow notes to run simultaneously. Run-all then submits each level into the target interpreters for this level, awaits termination of all results, and then starts the next level's note.


On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <[hidden email]> wrote:
Hi,

That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and run-all just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
Do you mind file a JIRA issue?

Thanks,
moon

On Thu, Apr 6, 2017 at 5:35 AM <[hidden email]> wrote:
I often have notebooks that have a %sh as the 1st paragraph. This scps some file from another server, and then a number of spark or sparksql paragraphs are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too at the same time. The others go into pending state and then start once the spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Ruslan Dautkhanov
In reply to this post by moon
Filed https://issues.apache.org/jira/browse/ZEPPELIN-2368 

We had users asking the same.. it forced them to run paragraphs one by one manually.




--
Ruslan Dautkhanov

On Wed, Apr 5, 2017 at 4:57 PM, moon soo Lee <[hidden email]> wrote:
Hi,

That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and run-all just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
Do you mind file a JIRA issue?

Thanks,
moon

On Thu, Apr 6, 2017 at 5:35 AM <[hidden email]> wrote:
I often have notebooks that have a %sh as the 1st paragraph. This scps some file from another server, and then a number of spark or sparksql paragraphs are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too at the same time. The others go into pending state and then start once the spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Ruslan Dautkhanov
In reply to this post by Jeff Zhang
Apart from introducing a full-blown graph of DAG dependencies, a simpler solution 
might be introducing a paragraph-level property "depends on previous paragraph" (boolean),
so in run-all-paragraphs run, this particular paragraph wouldn't be scheduled until 
previous one is complete (without errors).

It will be a compromise between completely sequential run and having a way to define a DAG.



--
Ruslan Dautkhanov

On Thu, Apr 6, 2017 at 1:32 AM, Jeff Zhang <[hidden email]> wrote:

That's correct, it needs define dependency between paragraphs, e.g.  %spark(deps=p1), so that we can build DAG for the whole pipeline.





Rick Moritz <[hidden email]>于2017年4月6日周四 下午3:28写道:
This actually calls for a dependency definition of notes within a notebook, so the scheduler can decide which tasks to run simultaneously.
I suggest a simple counter of dependency levels, which by default increases with every new note and can be decremented to allow notes to run simultaneously. Run-all then submits each level into the target interpreters for this level, awaits termination of all results, and then starts the next level's note.


On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <[hidden email]> wrote:
Hi,

That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and run-all just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
Do you mind file a JIRA issue?

Thanks,
moon

On Thu, Apr 6, 2017 at 5:35 AM <[hidden email]> wrote:
I often have notebooks that have a %sh as the 1st paragraph. This scps some file from another server, and then a number of spark or sparksql paragraphs are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too at the same time. The others go into pending state and then start once the spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Other paragraphs do not wait for %sh paragraphs to finish.

Jeff Zhang

"depends on previous paragraph" could be the default behavior is no deps is specified. Specifying dependencies explicitly could benefit the performance. e.g. In the spark tutorial note, the 3 sql could run at the same time independently.



Ruslan Dautkhanov <[hidden email]>于2017年4月7日周五 上午1:09写道:
Apart from introducing a full-blown graph of DAG dependencies, a simpler solution 
might be introducing a paragraph-level property "depends on previous paragraph" (boolean),
so in run-all-paragraphs run, this particular paragraph wouldn't be scheduled until 
previous one is complete (without errors).

It will be a compromise between completely sequential run and having a way to define a DAG.



--
Ruslan Dautkhanov

On Thu, Apr 6, 2017 at 1:32 AM, Jeff Zhang <[hidden email]> wrote:

That's correct, it needs define dependency between paragraphs, e.g.  %spark(deps=p1), so that we can build DAG for the whole pipeline.





Rick Moritz <[hidden email]>于2017年4月6日周四 下午3:28写道:
This actually calls for a dependency definition of notes within a notebook, so the scheduler can decide which tasks to run simultaneously.
I suggest a simple counter of dependency levels, which by default increases with every new note and can be decremented to allow notes to run simultaneously. Run-all then submits each level into the target interpreters for this level, awaits termination of all results, and then starts the next level's note.


On Thu, Apr 6, 2017 at 12:57 AM, moon soo Lee <[hidden email]> wrote:
Hi,

That's expected behavior at the moment. The reason is

Each interpreter has it's own scheduler (either FIFO, Parallel), and run-all just submit all paragraphs into target interpreter's scheduler.

I think we can add feature such as run-all-sequentially.
Do you mind file a JIRA issue?

Thanks,
moon

On Thu, Apr 6, 2017 at 5:35 AM <[hidden email]> wrote:
I often have notebooks that have a %sh as the 1st paragraph. This scps some file from another server, and then a number of spark or sparksql paragraphs are after that.

If I click on the run-all paragraphs at the top of the notebook the 1st %sh paragraph kicks off as expected, but the 2nd %spark notebook starts too at the same time. The others go into pending state and then start once the spark one has completed.

Is this a bug? Or am I doing something wrong?

Thanks



Loading...