Scheduler already terminated Exception

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Scheduler already terminated Exception

linxi zeng
hi, moon:

After change some settings and restarting interpreter, the scheduler of interpreter will be terminated and the RemoteInterpreterServer process should be stopped too. But if the RemoteInterpreterServer didn't shutdown as expected, an exception named "Scheduler already terminated" will be thrown when we run paragraphs using this interpreter (such as spark). Then restart the zeppelin server seems the only way to solve the problem. 

This problem has already happen several times, but still have no idea how to stable reproduct it.  I was thinking that if we can restart the RemoteInterpreterServer when we catch this Exception?

Do you have any idea to solve this problem? 


By the way, The detail error info is like that:

 INFO [2015-09-06 10:21:47,487] ({qtp1633200777-7462} NotebookServer.java[onMessage]:112) - RECEIVE << RUN_PARAGRAPH
 INFO [2015-09-06 10:21:47,493] ({qtp1633200777-7462} NotebookServer.java[broadcast]:264) - SEND >> NOTE
ERROR [2015-09-06 10:21:47,495] ({qtp1633200777-7462} NotebookServer.java[runParagraph]:640) - Exception from run
java.lang.RuntimeException: Scheduler already terminated
        at org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
        at org.apache.zeppelin.notebook.Note.run(Note.java:282)
        at org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:638)
        at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:137)
        at org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC6455.java:835)
        at org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
Reply | Threaded
Open this post in threaded view
|

Re: Scheduler already terminated Exception

moon
Administrator
If there're some way to reproduce the problem it'll help a lot.
Let me investigate more on this problem.

I'm working on improving interpreter process restart.
 https://github.com/Leemoonsoo/incubator-zeppelin/commit/3200b9aac26d394a67d496c3b209eb3cda046c4a
Once i know how to reproduce "Scheduler already terminated Exception", I'll make pullrequest together with this improvement.

Thanks,
moon

On Mon, Sep 7, 2015 at 5:44 AM linxi zeng <[hidden email]> wrote:
hi, moon:

After change some settings and restarting interpreter, the scheduler of interpreter will be terminated and the RemoteInterpreterServer process should be stopped too. But if the RemoteInterpreterServer didn't shutdown as expected, an exception named "Scheduler already terminated" will be thrown when we run paragraphs using this interpreter (such as spark). Then restart the zeppelin server seems the only way to solve the problem. 

This problem has already happen several times, but still have no idea how to stable reproduct it.  I was thinking that if we can restart the RemoteInterpreterServer when we catch this Exception?

Do you have any idea to solve this problem? 


By the way, The detail error info is like that:

 INFO [2015-09-06 10:21:47,487] ({qtp1633200777-7462} NotebookServer.java[onMessage]:112) - RECEIVE << RUN_PARAGRAPH
 INFO [2015-09-06 10:21:47,493] ({qtp1633200777-7462} NotebookServer.java[broadcast]:264) - SEND >> NOTE
ERROR [2015-09-06 10:21:47,495] ({qtp1633200777-7462} NotebookServer.java[runParagraph]:640) - Exception from run
java.lang.RuntimeException: Scheduler already terminated
        at org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
        at org.apache.zeppelin.notebook.Note.run(Note.java:282)
        at org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:638)
        at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:137)
        at org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC6455.java:835)
        at org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
Reply | Threaded
Open this post in threaded view
|

Re: Scheduler already terminated Exception

linxi zeng
actually, there is a way to reproduce the problem (maybe not a very suitable example):
(1)modify dereference() in RemoteInterpreterProcess.java like this:

diff --git a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java

index 534af27..e02b16a 100644

--- a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java

+++ b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java

@@ -146,7 +146,8 @@ public class RemoteInterpreterProcess implements ExecuteResultHandler {

   public int dereference() {

     synchronized (referenceCount) {

       int r = referenceCount.decrementAndGet();

-      if (r == 0) {

+      //if (r == 0) {

+      if (false) {

         logger.info("shutdown interpreter process");

         remoteInterpreterEventPoller.shutdown();


(2)restart this interpreter in interpreter settings

内嵌图片 1

(3)run spark paragraph:

内嵌图片 2



2015-09-09 23:13 GMT+08:00 moon soo Lee <[hidden email]>:
If there're some way to reproduce the problem it'll help a lot.
Let me investigate more on this problem.

I'm working on improving interpreter process restart.
 https://github.com/Leemoonsoo/incubator-zeppelin/commit/3200b9aac26d394a67d496c3b209eb3cda046c4a
Once i know how to reproduce "Scheduler already terminated Exception", I'll make pullrequest together with this improvement.

Thanks,
moon


On Mon, Sep 7, 2015 at 5:44 AM linxi zeng <[hidden email]> wrote:
hi, moon:

After change some settings and restarting interpreter, the scheduler of interpreter will be terminated and the RemoteInterpreterServer process should be stopped too. But if the RemoteInterpreterServer didn't shutdown as expected, an exception named "Scheduler already terminated" will be thrown when we run paragraphs using this interpreter (such as spark). Then restart the zeppelin server seems the only way to solve the problem. 

This problem has already happen several times, but still have no idea how to stable reproduct it.  I was thinking that if we can restart the RemoteInterpreterServer when we catch this Exception?

Do you have any idea to solve this problem? 


By the way, The detail error info is like that:

 INFO [2015-09-06 10:21:47,487] ({qtp1633200777-7462} NotebookServer.java[onMessage]:112) - RECEIVE << RUN_PARAGRAPH
 INFO [2015-09-06 10:21:47,493] ({qtp1633200777-7462} NotebookServer.java[broadcast]:264) - SEND >> NOTE
ERROR [2015-09-06 10:21:47,495] ({qtp1633200777-7462} NotebookServer.java[runParagraph]:640) - Exception from run
java.lang.RuntimeException: Scheduler already terminated
        at org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
        at org.apache.zeppelin.notebook.Note.run(Note.java:282)
        at org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:638)
        at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:137)
        at org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC6455.java:835)
        at org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)

Reply | Threaded
Open this post in threaded view
|

Re: Scheduler already terminated Exception

Sourav Mazumder
Hi Moon,

I can suggest another approach to reproduce this.

1. Create a spark interpreter with less Executor memory (say 128 M).

2. Using this interpreter try to do something memory intensive. Say you try to load a data set worth of 20GB and then run a select count(*). This will eventually kill the executor process and I generally get RemoteInterpreter not found/Connection refused error.

3. Now you try to rerun the same paragraph executing Select count(*). You will get scheduler terminated error.

Regards,
Sourav




On Thu, Sep 17, 2015 at 5:25 AM, linxi zeng <[hidden email]> wrote:
actually, there is a way to reproduce the problem (maybe not a very suitable example):
(1)modify dereference() in RemoteInterpreterProcess.java like this:

diff --git a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java

index 534af27..e02b16a 100644

--- a/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java

+++ b/zeppelin-interpreter/src/main/java/org/apache/zeppelin/interpreter/remote/RemoteInterpreterProcess.java

@@ -146,7 +146,8 @@ public class RemoteInterpreterProcess implements ExecuteResultHandler {

   public int dereference() {

     synchronized (referenceCount) {

       int r = referenceCount.decrementAndGet();

-      if (r == 0) {

+      //if (r == 0) {

+      if (false) {

         logger.info("shutdown interpreter process");

         remoteInterpreterEventPoller.shutdown();


(2)restart this interpreter in interpreter settings

内嵌图片 1

(3)run spark paragraph:

内嵌图片 2



2015-09-09 23:13 GMT+08:00 moon soo Lee <[hidden email]>:
If there're some way to reproduce the problem it'll help a lot.
Let me investigate more on this problem.

I'm working on improving interpreter process restart.
 https://github.com/Leemoonsoo/incubator-zeppelin/commit/3200b9aac26d394a67d496c3b209eb3cda046c4a
Once i know how to reproduce "Scheduler already terminated Exception", I'll make pullrequest together with this improvement.

Thanks,
moon


On Mon, Sep 7, 2015 at 5:44 AM linxi zeng <[hidden email]> wrote:
hi, moon:

After change some settings and restarting interpreter, the scheduler of interpreter will be terminated and the RemoteInterpreterServer process should be stopped too. But if the RemoteInterpreterServer didn't shutdown as expected, an exception named "Scheduler already terminated" will be thrown when we run paragraphs using this interpreter (such as spark). Then restart the zeppelin server seems the only way to solve the problem. 

This problem has already happen several times, but still have no idea how to stable reproduct it.  I was thinking that if we can restart the RemoteInterpreterServer when we catch this Exception?

Do you have any idea to solve this problem? 


By the way, The detail error info is like that:

 INFO [2015-09-06 10:21:47,487] ({qtp1633200777-7462} NotebookServer.java[onMessage]:112) - RECEIVE << RUN_PARAGRAPH
 INFO [2015-09-06 10:21:47,493] ({qtp1633200777-7462} NotebookServer.java[broadcast]:264) - SEND >> NOTE
ERROR [2015-09-06 10:21:47,495] ({qtp1633200777-7462} NotebookServer.java[runParagraph]:640) - Exception from run
java.lang.RuntimeException: Scheduler already terminated
        at org.apache.zeppelin.scheduler.RemoteScheduler.submit(RemoteScheduler.java:124)
        at org.apache.zeppelin.notebook.Note.run(Note.java:282)
        at org.apache.zeppelin.socket.NotebookServer.runParagraph(NotebookServer.java:638)
        at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:137)
        at org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:56)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC6455.java:835)
        at org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
        at org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)