Zeppelin Stops Loading Notes

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Zeppelin Stops Loading Notes

Paul Brenner
We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:
 INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX
 INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout
 INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout
 INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}
 INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413
ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message
java.lang.NumberFormatException: For input string: "false"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)
at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)
at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Anyone have any idea what is going on or how we could trouble shoot?

Paul Brenner
DATA SCIENTIST
(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ
Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

moon
Administrator
Hi,

One of configuration value in your conf/zeppelin-env.sh or conf/zeppelin-site.xml seems "false" which expected to be to a number.

Do you have any environment variable or property set to "false" for the configurations below?

ZEPPELIN_PORT, zeppelin.server.port
ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.timeout
ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit

Thanks,
moon

On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner <[hidden email]> wrote:
We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:
 INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX
 INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout
 INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout
 INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}
 INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413
ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message
java.lang.NumberFormatException: For input string: "false"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)
at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)
at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Anyone have any idea what is going on or how we could trouble shoot?

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ
Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Paul Brenner
You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

I’ll report back if that happens again after the fix.

Paul Brenner
DATA SCIENTIST
(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee <">moon soo Lee > wrote:
Hi,

One of configuration value in your conf/zeppelin-env.sh or conf/zeppelin-site.xml seems "false" which expected to be to a number.

Do you have any environment variable or property set to "false" for the configurations below?

ZEPPELIN_PORT, zeppelin.server.port
ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.timeout
ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit

Thanks,
moon

On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner <[hidden email]> wrote:
We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:
 INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX
 INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout
 INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout
 INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}
 INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413
ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message
java.lang.NumberFormatException: For input string: "false"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)
at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)
at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Anyone have any idea what is going on or how we could trouble shoot?

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Ben Vogan
I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

--Ben

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:
You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

I’ll report back if that happens again after the fix.

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee <[hidden email]> wrote:
Hi,

One of configuration value in your conf/zeppelin-env.sh or conf/zeppelin-site.xml seems "false" which expected to be to a number.

Do you have any environment variable or property set to "false" for the configurations below?

ZEPPELIN_PORT, zeppelin.server.port
ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.timeout
ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit

Thanks,
moon

On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner <[hidden email]> wrote:
We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:
 INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX
 INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout
 INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout
 INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}
 INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413
ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message
java.lang.NumberFormatException: For input string: "false"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)
at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)
at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Anyone have any idea what is going on or how we could trouble shoot?

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ




--
BENJAMIN VOGAN | Data Platform Team Lead

Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Fabian Böhnlein
Hi Paul, Ben,

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

Sometimes it's specific notes that don't load.
Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

We're pretty clueless about it.

Any front-end related logs we could enable to find out more?

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:
I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

--Ben

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:
You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

I’ll report back if that happens again after the fix.

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee <[hidden email]> wrote:
Hi,

One of configuration value in your conf/zeppelin-env.sh or conf/zeppelin-site.xml seems "false" which expected to be to a number.

Do you have any environment variable or property set to "false" for the configurations below?

ZEPPELIN_PORT, zeppelin.server.port
ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.timeout
ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit

Thanks,
moon

On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner <[hidden email]> wrote:
We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:
 INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX
 INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout
 INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout
 INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}
 INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413
ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message
java.lang.NumberFormatException: For input string: "false"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)
at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)
at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Anyone have any idea what is going on or how we could trouble shoot?

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ




--
BENJAMIN VOGAN | Data Platform Team Lead

Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Geoffrey Cheng
we have the same issue.  usually when multiple ppl using it, only header loads. 

we tried couldn't find solution, so we restart every single time.   in fact ,  we have to restart daily at least.  

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <[hidden email]> wrote:
Hi Paul, Ben,

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

Sometimes it's specific notes that don't load.
Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

We're pretty clueless about it.

Any front-end related logs we could enable to find out more?

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:
I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

--Ben

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:
You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

I’ll report back if that happens again after the fix.

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee <[hidden email]> wrote:
Hi,

One of configuration value in your conf/zeppelin-env.sh or conf/zeppelin-site.xml seems "false" which expected to be to a number.

Do you have any environment variable or property set to "false" for the configurations below?

ZEPPELIN_PORT, zeppelin.server.port
ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.timeout
ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit

Thanks,
moon

On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner <[hidden email]> wrote:
We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:
 INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX
 INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout
 INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout
 INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}
 INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413
ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message
java.lang.NumberFormatException: For input string: "false"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)
at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)
at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Anyone have any idea what is going on or how we could trouble shoot?

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ




--
BENJAMIN VOGAN | Data Platform Team Lead

Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Paul Brenner
Does this issue need a Jira ticket? The problem is that I have no idea how to reproduce and I’m not sure if there is anything in the logs that is relevant.

Any ideas how we can produce an actionable Jira ticket out of this?

Paul Brenner
DATA SCIENTIST
(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng <">Geoffrey Cheng > wrote:
we have the same issue.  usually when multiple ppl using it, only header loads. 

we tried couldn't find solution, so we restart every single time.   in fact ,  we have to restart daily at least.  

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <[hidden email]> wrote:
Hi Paul, Ben,

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

Sometimes it's specific notes that don't load.
Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

We're pretty clueless about it.

Any front-end related logs we could enable to find out more?

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:
I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

--Ben

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:
You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

I’ll report back if that happens again after the fix.

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Fri, Aug 18, 2017 at 6:37 PM moon soo Lee <[hidden email]> wrote:
Hi,

One of configuration value in your conf/zeppelin-env.sh or conf/zeppelin-site.xml seems "false" which expected to be to a number.

Do you have any environment variable or property set to "false" for the configurations below?

ZEPPELIN_PORT, zeppelin.server.port
ZEPPELIN_SSL_PORT, zeppelin.server.ssl.port
ZEPPELIN_INTERPRETER_CONNECT_TIMEOUT, zeppelin.interpreter.connect.timeout
ZEPPELIN_INTERPRETER_MAX_POOL_SIZE, zeppelin.interpreter.max.poolsize
ZEPPELIN_INTERPRETER_OUTPUT_LIMIT, zeppelin.interpreter.output.limit

Thanks,
moon

On Fri, Aug 18, 2017 at 2:30 PM Paul Brenner <[hidden email]> wrote:
We have a team of 5 users who all use the same zeppelin server. Lately a few times we have run into a case where zeppelin notes stop responding and then when we try refreshing the webpage for the note all that loads is the zeppelin header with no note. When I look at the logs I see:
 INFO [2017-08-18 21:23:06,569] ({qtp1286783232-14114} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.26 : 55178 : nshah : GET_NOTE : 2CR2ANDEX
 INFO [2017-08-18 21:24:05,740] ({qtp1286783232-14115} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57366. (1001) Idle Timeout
 INFO [2017-08-18 21:24:08,084] ({qtp1286783232-14121} NotebookServer.java[onClose]:363) - Closed connection to 10.201.12.22 : 57461. (1001) Idle Timeout
 INFO [2017-08-18 21:25:10,133] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,157] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-08-18 21:25:10,172] ({qtp1286783232-14122} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-08-18 21:25:10,192] ({qtp1286783232-14122} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"pbrenner","ticket":"5f717a1a-46df-4178-bc9a-690f66064d0e","roles":"[]"}}
 INFO [2017-08-18 21:25:10,252] ({qtp1286783232-14123} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.48 : 62413
ERROR [2017-08-18 21:25:10,353] ({qtp1286783232-14122} NotebookServer.java[onMessage]:357) - Can't handle message
java.lang.NumberFormatException: For input string: "false"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:580)
at java.lang.Integer.parseInt(Integer.java:615)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:213)
at org.apache.zeppelin.conf.ZeppelinConfiguration.getInt(ZeppelinConfiguration.java:208)
at org.apache.zeppelin.conf.ZeppelinConfiguration.dumpConfigurations(ZeppelinConfiguration.java:496)
at org.apache.zeppelin.socket.NotebookServer.sendAllConfigurations(NotebookServer.java:1680)
at org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:318)
at org.apache.zeppelin.socket.NotebookSocket.onWebSocketText(NotebookSocket.java:59)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:128)
at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:65)
at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:122)
at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:161)
at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:309)
at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:214)
at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:220)
at org.eclipse.jetty.websocket.common.Parser.parse(Parser.java:258)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.readParse(AbstractWebSocketConnection.java:632)
at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:480)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)

Anyone have any idea what is going on or how we could trouble shoot?

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ




--
BENJAMIN VOGAN | Data Platform Team Lead


Reply | Threaded
Open this post in threaded view
|

RE: Zeppelin Stops Loading Notes

Belousov Maksim Eduardovich

Paul, Ben, Fabian,

please share your workload at time when notes are not loading.

 

How much interpreters were started at that moment?

 

You can find all started interpreters in linux command line with:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | wc -l

 

And spark started interpreters:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | grep spark | wc -l

 

 


Максим Белоусов
Архитектор

Отдел отчетности и витрин данных

Управление хранилищ данных и отчетности
Тел.: +7 495 648-10-00, доб. 2271

 

From: Paul Brenner [mailto:[hidden email]]
Sent: Thursday, October 12, 2017 7:45 PM
To: Geoffrey Cheng <[hidden email]>; [hidden email]
Subject: Re: Zeppelin Stops Loading Notes

 

Does this issue need a Jira ticket? The problem is that I have no idea how to reproduce and I’m not sure if there is anything in the logs that is relevant.

 

Any ideas how we can produce an actionable Jira ticket out of this?

Paul Brenner

DATA SCIENTIST

(217) 390-3033  


PlaceIQ:Landmark by PlaceIQ

 

On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng <[hidden email]> wrote:

we have the same issue.  usually when multiple ppl using it, only header loads. 

 

we tried couldn't find solution, so we restart every single time.   in fact ,  we have to restart daily at least.  

 

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <[hidden email]> wrote:

Hi Paul, Ben,

 

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

 

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

 

Sometimes it's specific notes that don't load.

Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

 

We're pretty clueless about it.

 

Any front-end related logs we could enable to find out more?

 

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:

I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

 

--Ben

 

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:

You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

 

I’ll report back if that happens again after the fix.

 

 

 

 



 

--

 

Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Fabian Böhnlein

Usually 2-3 interpreter running and use, where multiple users might be using the same interpreter in Per Note, Scoped setting.

Though it might also happen with just 1-2 interpreter running and only single user on the UI.


Belousov Maksim Eduardovich <[hidden email]> schrieb am Fr., 13. Okt. 2017, 10:44:

Paul, Ben, Fabian,

please share your workload at time when notes are not loading.

 

How much interpreters were started at that moment?

 

You can find all started interpreters in linux command line with:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | wc -l

 

And spark started interpreters:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | grep spark | wc -l

 

 


Максим Белоусов
Архитектор

Отдел отчетности и витрин данных

Управление хранилищ данных и отчетности
Тел.: +7 495 648-10-00, доб. 2271

 

From: Paul Brenner [mailto:[hidden email]]
Sent: Thursday, October 12, 2017 7:45 PM
To: Geoffrey Cheng <[hidden email]>; [hidden email]
Subject: Re: Zeppelin Stops Loading Notes

 

Does this issue need a Jira ticket? The problem is that I have no idea how to reproduce and I’m not sure if there is anything in the logs that is relevant.

 

Any ideas how we can produce an actionable Jira ticket out of this?

Paul Brenner

DATA SCIENTIST

(217) 390-3033  


PlaceIQ:Landmark by PlaceIQ

On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng <[hidden email]> wrote:

we have the same issue.  usually when multiple ppl using it, only header loads. 

 

we tried couldn't find solution, so we restart every single time.   in fact ,  we have to restart daily at least.  

 

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <[hidden email]> wrote:

Hi Paul, Ben,

 

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

 

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

 

Sometimes it's specific notes that don't load.

Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

 

We're pretty clueless about it.

 

Any front-end related logs we could enable to find out more?

 

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:

I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

 

--Ben

 

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:

You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

 

I’ll report back if that happens again after the fix.

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Paul Brenner
Although zeppelin has died a few times since this email came up, today was the first time where I was able to actually check the number of interpreters. 

All started interpreters:
7
Spark started interpreters:
5

That doesn’t seem unreasonable to me.

Here is what top is showing me, which doesn’t look like much load to me:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                      30815 yarn      20   0 5021872 1.318g  35740 S   2.0  8.5  43:29.53 java
14828 yarn      20   0 4882384 991068  18944 S   0.7  6.1  21:27.67 java                                                                                                                                                         37266 yarn      20   0 4991372 1.025g  35696 S   0.7  6.6  15:50.97 java
39201 yarn      20   0 4981424 1.221g  34944 S   0.7  7.9  14:00.73 java                                                                                                                                                         43625 yarn      20   0 4970028 1.137g  34012 S   0.7  7.3   6:30.84 java
29054 yarn      20   0 4895580 1.509g  39428 S   0.3  9.7 112:33.44 java                                                                                                                                                         46094 yarn      20   0 4092848 137112  12884 S   0.3  0.8   2:27.30 java
29042 yarn      20   0  113124   1580   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               29053 yarn      20   0  113120    648    392 S   0.0  0.0   0:00.00 interpreter.sh
30803 yarn      20   0  113124   1576   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               30814 yarn      20   0  113120    644    392 S   0.0  0.0   0:00.00 interpreter.sh
37253 yarn      20   0  113124   1564   1308 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               37265 yarn      20   0  113120    652    396 S   0.0  0.0   0:00.00 interpreter.sh
39188 yarn      20   0  113124   1580   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               39200 yarn      20   0  113120    648    392 S   0.0  0.0   0:00.00 interpreter.sh
43612 yarn      20   0  113124   1576   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               43624 yarn      20   0  113120    644    392 S   0.0  0.0   0:00.00 interpreter.sh
46636 yarn      20   0  113124   1516   1268 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               46647 yarn      20   0  113120    848    600 S   0.0  0.0   0:00.00 interpreter.sh
46648 yarn      20   0 4026284 150560  13024 S   0.0  0.9   1:56.46 java

finally here is what I see in the logs… I don’t know what is going on with that authorization cache complaint but I assume not related?

 INFO [2017-10-17 17:53:08,312] ({qtp1286783232-8117} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.29 : 54311
 INFO [2017-10-17 17:53:08,394] ({qtp1286783232-8113} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.29 : 54311 : gprabhu : GET_NOTE : 2CVZZ1XWN
 INFO [2017-10-17 17:53:09,297] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-10-17 17:53:09,317] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-10-17 17:53:09,351] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-10-17 17:53:09,381] ({qtp1286783232-8117} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"gprabhu","ticket":"d7d18244-99c5-4eeb-941c-243dc7cc0ca3","roles":"[]"}}

Paul Brenner
DATA SCIENTIST
(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Sat, Oct 14, 2017 at 6:02 AM "Fabian Böhnlein" <[hidden email]> wrote:

Usually 2-3 interpreter running and use, where multiple users might be using the same interpreter in Per Note, Scoped setting.

Though it might also happen with just 1-2 interpreter running and only single user on the UI.


Belousov Maksim Eduardovich <[hidden email]> schrieb am Fr., 13. Okt. 2017, 10:44:

Paul, Ben, Fabian,

please share your workload at time when notes are not loading.

 

How much interpreters were started at that moment?

 

You can find all started interpreters in linux command line with:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | wc -l

 

And spark started interpreters:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | grep spark | wc -l

 

 


Максим Белоусов
Архитектор

Отдел отчетности и витрин данных

Управление хранилищ данных и отчетности
Тел.: +7 495 648-10-00, доб. 2271

 

From: Paul Brenner [mailto:[hidden email]]
Sent: Thursday, October 12, 2017 7:45 PM
To: Geoffrey Cheng <[hidden email]>; [hidden email]
Subject: Re: Zeppelin Stops Loading Notes

 

Does this issue need a Jira ticket? The problem is that I have no idea how to reproduce and I’m not sure if there is anything in the logs that is relevant.

 

Any ideas how we can produce an actionable Jira ticket out of this?

Paul Brenner

DATA SCIENTIST

(217) 390-3033  


PlaceIQ:Landmark by PlaceIQ

On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng <[hidden email]> wrote:

we have the same issue.  usually when multiple ppl using it, only header loads. 

 

we tried couldn't find solution, so we restart every single time.   in fact ,  we have to restart daily at least.  

 

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <[hidden email]> wrote:

Hi Paul, Ben,

 

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

 

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

 

Sometimes it's specific notes that don't load.

Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

 

We're pretty clueless about it.

 

Any front-end related logs we could enable to find out more?

 

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:

I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

 

--Ben

 

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:

You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

 

I’ll report back if that happens again after the fix.

 

 


Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Paul Brenner
As a bandaid for this, is there anyway to restart the zeppelin web server without killing running spark jobs? I suppose there would still be a risk that anyone actively editing a cell might lose their edits, but this would make it easier than finding a time when multiple people aren’t actively running jobs.

Paul Brenner
DATA SCIENTIST
(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Tue, Oct 17, 2017 at 1:57 PM Paul Brenner <">Paul Brenner > wrote:
Although zeppelin has died a few times since this email came up, today was the first time where I was able to actually check the number of interpreters. 

All started interpreters:
7
Spark started interpreters:
5

That doesn’t seem unreasonable to me.

Here is what top is showing me, which doesn’t look like much load to me:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                      30815 yarn      20   0 5021872 1.318g  35740 S   2.0  8.5  43:29.53 java
14828 yarn      20   0 4882384 991068  18944 S   0.7  6.1  21:27.67 java                                                                                                                                                         37266 yarn      20   0 4991372 1.025g  35696 S   0.7  6.6  15:50.97 java
39201 yarn      20   0 4981424 1.221g  34944 S   0.7  7.9  14:00.73 java                                                                                                                                                         43625 yarn      20   0 4970028 1.137g  34012 S   0.7  7.3   6:30.84 java
29054 yarn      20   0 4895580 1.509g  39428 S   0.3  9.7 112:33.44 java                                                                                                                                                         46094 yarn      20   0 4092848 137112  12884 S   0.3  0.8   2:27.30 java
29042 yarn      20   0  113124   1580   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               29053 yarn      20   0  113120    648    392 S   0.0  0.0   0:00.00 interpreter.sh
30803 yarn      20   0  113124   1576   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               30814 yarn      20   0  113120    644    392 S   0.0  0.0   0:00.00 interpreter.sh
37253 yarn      20   0  113124   1564   1308 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               37265 yarn      20   0  113120    652    396 S   0.0  0.0   0:00.00 interpreter.sh
39188 yarn      20   0  113124   1580   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               39200 yarn      20   0  113120    648    392 S   0.0  0.0   0:00.00 interpreter.sh
43612 yarn      20   0  113124   1576   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               43624 yarn      20   0  113120    644    392 S   0.0  0.0   0:00.00 interpreter.sh
46636 yarn      20   0  113124   1516   1268 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               46647 yarn      20   0  113120    848    600 S   0.0  0.0   0:00.00 interpreter.sh
46648 yarn      20   0 4026284 150560  13024 S   0.0  0.9   1:56.46 java

finally here is what I see in the logs… I don’t know what is going on with that authorization cache complaint but I assume not related?

 INFO [2017-10-17 17:53:08,312] ({qtp1286783232-8117} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.29 : 54311
 INFO [2017-10-17 17:53:08,394] ({qtp1286783232-8113} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.29 : 54311 : gprabhu : GET_NOTE : 2CVZZ1XWN
 INFO [2017-10-17 17:53:09,297] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-10-17 17:53:09,317] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-10-17 17:53:09,351] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-10-17 17:53:09,381] ({qtp1286783232-8117} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"gprabhu","ticket":"d7d18244-99c5-4eeb-941c-243dc7cc0ca3","roles":"[]"}}

Paul Brenner
DATA SCIENTIST
(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Sat, Oct 14, 2017 at 6:02 AM "Fabian Böhnlein" <[hidden email]> wrote:

Usually 2-3 interpreter running and use, where multiple users might be using the same interpreter in Per Note, Scoped setting.

Though it might also happen with just 1-2 interpreter running and only single user on the UI.


Belousov Maksim Eduardovich <[hidden email]> schrieb am Fr., 13. Okt. 2017, 10:44:

Paul, Ben, Fabian,

please share your workload at time when notes are not loading.

 

How much interpreters were started at that moment?

 

You can find all started interpreters in linux command line with:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | wc -l

 

And spark started interpreters:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | grep spark | wc -l

 

 


Максим Белоусов
Архитектор

Отдел отчетности и витрин данных

Управление хранилищ данных и отчетности
Тел.: +7 495 648-10-00, доб. 2271

 

From: Paul Brenner [mailto:[hidden email]]
Sent: Thursday, October 12, 2017 7:45 PM
To: Geoffrey Cheng <[hidden email]>; [hidden email]
Subject: Re: Zeppelin Stops Loading Notes

 

Does this issue need a Jira ticket? The problem is that I have no idea how to reproduce and I’m not sure if there is anything in the logs that is relevant.

 

Any ideas how we can produce an actionable Jira ticket out of this?

Paul Brenner

DATA SCIENTIST

(217) 390-3033  


PlaceIQ:Landmark by PlaceIQ

On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng <[hidden email]> wrote:

we have the same issue.  usually when multiple ppl using it, only header loads. 

 

we tried couldn't find solution, so we restart every single time.   in fact ,  we have to restart daily at least.  

 

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <[hidden email]> wrote:

Hi Paul, Ben,

 

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

 

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

 

Sometimes it's specific notes that don't load.

Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

 

We're pretty clueless about it.

 

Any front-end related logs we could enable to find out more?

 

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:

I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

 

--Ben

 

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:

You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

 

I’ll report back if that happens again after the fix.

 

 



Reply | Threaded
Open this post in threaded view
|

Re: Zeppelin Stops Loading Notes

Geoffrey Cheng
This is the exception that I see EVERYTIME zeppelin fails to load notes onto the website . (only header shows)

I don't know where to create a ticket for this.  Can someone help?


INFO [2017-10-20 19:16:30,079] ({qtp1967892594-15869} NotebookServer.java[onOpen]:156) - New connection from 100.2.8.84 : 57366

 WARN [2017-10-20 19:16:42,555] ({qtp1967892594-15870} LoginRestApi.java[logout]:131) - {"status":"UNAUTHORIZED","message":"","body":""}

 WARN [2017-10-20 19:16:42,620] ({qtp1967892594-15870} ServletHandler.java[doHandle]:620) -

javax.servlet.ServletException: Filtered request failed.

        at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:384)

        at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)

        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

        at org.apache.zeppelin.server.CorsFilter.doFilter(CorsFilter.java:72)

        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)

        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)

        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)

        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)

        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)

        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)

        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)

        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)

        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)

        at org.eclipse.jetty.server.Server.handle(Server.java:499)

        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)

        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)

        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:544)

        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)

        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)

        at java.lang.Thread.run(Thread.java:748)

Caused by: java.lang.AbstractMethodError: javax.ws.rs.core.Response.getStatusInfo()Ljavax/ws/rs/core/Response$StatusType;

        at javax.ws.rs.WebApplicationException.validate(WebApplicationException.java:186)

        at javax.ws.rs.ClientErrorException.<init>(ClientErrorException.java:88)

        at org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:503)

        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:198)

        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:90)

        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:272)

        at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)

        at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:239)

        at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:248)

        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:222)

        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:153)

        at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:167)

        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:286)

        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:211)

        at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)

        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:262)

        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:812)

        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669)

        at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)

        at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)

        at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)

        at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)

        at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)

        at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:449)

        at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)

        at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)

        at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)

        at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:383)

        at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)

        ... 22 more


I don't know where to create a ticket for this.  Can someone help?


Geoff


On Tue, Oct 17, 2017 at 2:07 PM, Paul Brenner <[hidden email]> wrote:
As a bandaid for this, is there anyway to restart the zeppelin web server without killing running spark jobs? I suppose there would still be a risk that anyone actively editing a cell might lose their edits, but this would make it easier than finding a time when multiple people aren’t actively running jobs.

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Tue, Oct 17, 2017 at 1:57 PM Paul Brenner <[hidden email]> wrote:
Although zeppelin has died a few times since this email came up, today was the first time where I was able to actually check the number of interpreters. 

All started interpreters:
7
Spark started interpreters:
5

That doesn’t seem unreasonable to me.

Here is what top is showing me, which doesn’t look like much load to me:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                      30815 yarn      20   0 5021872 1.318g  35740 S   2.0  8.5  43:29.53 java
14828 yarn      20   0 4882384 991068  18944 S   0.7  6.1  21:27.67 java                                                                                                                                                         37266 yarn      20   0 4991372 1.025g  35696 S   0.7  6.6  15:50.97 java
39201 yarn      20   0 4981424 1.221g  34944 S   0.7  7.9  14:00.73 java                                                                                                                                                         43625 yarn      20   0 4970028 1.137g  34012 S   0.7  7.3   6:30.84 java
29054 yarn      20   0 4895580 1.509g  39428 S   0.3  9.7 112:33.44 java                                                                                                                                                         46094 yarn      20   0 4092848 137112  12884 S   0.3  0.8   2:27.30 java
29042 yarn      20   0  113124   1580   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               29053 yarn      20   0  113120    648    392 S   0.0  0.0   0:00.00 interpreter.sh
30803 yarn      20   0  113124   1576   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               30814 yarn      20   0  113120    644    392 S   0.0  0.0   0:00.00 interpreter.sh
37253 yarn      20   0  113124   1564   1308 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               37265 yarn      20   0  113120    652    396 S   0.0  0.0   0:00.00 interpreter.sh
39188 yarn      20   0  113124   1580   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               39200 yarn      20   0  113120    648    392 S   0.0  0.0   0:00.00 interpreter.sh
43612 yarn      20   0  113124   1576   1320 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               43624 yarn      20   0  113120    644    392 S   0.0  0.0   0:00.00 interpreter.sh
46636 yarn      20   0  113124   1516   1268 S   0.0  0.0   0:00.00 interpreter.sh                                                                                                                                               46647 yarn      20   0  113120    848    600 S   0.0  0.0   0:00.00 interpreter.sh
46648 yarn      20   0 4026284 150560  13024 S   0.0  0.9   1:56.46 java

finally here is what I see in the logs… I don’t know what is going on with that authorization cache complaint but I assume not related?

 INFO [2017-10-17 17:53:08,312] ({qtp1286783232-8117} NotebookServer.java[onOpen]:156) - New connection from 10.201.12.29 : 54311
 INFO [2017-10-17 17:53:08,394] ({qtp1286783232-8113} NotebookServer.java[sendNote]:705) - New operation from 10.201.12.29 : 54311 : gprabhu : GET_NOTE : 2CVZZ1XWN
 INFO [2017-10-17 17:53:09,297] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-10-17 17:53:09,317] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 INFO [2017-10-17 17:53:09,351] ({qtp1286783232-8117} AuthorizingRealm.java[getAuthorizationCacheLazy]:248) - No cache or cacheManager properties have been set.  Authorization cache cannot be obtained.
 WARN [2017-10-17 17:53:09,381] ({qtp1286783232-8117} SecurityRestApi.java[ticket]:87) - {"status":"OK","message":"","body":{"principal":"gprabhu","ticket":"d7d18244-99c5-4eeb-941c-243dc7cc0ca3","roles":"[]"}}

Paul Brenner
DATA SCIENTIST
<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  

PlaceIQ:Landmark by PlaceIQ

On Sat, Oct 14, 2017 at 6:02 AM "Fabian Böhnlein" <">"Fabian Böhnlein" > wrote:

Usually 2-3 interpreter running and use, where multiple users might be using the same interpreter in Per Note, Scoped setting.

Though it might also happen with just 1-2 interpreter running and only single user on the UI.


Belousov Maksim Eduardovich <[hidden email]> schrieb am Fr., 13. Okt. 2017, 10:44:

Paul, Ben, Fabian,

please share your workload at time when notes are not loading.

 

How much interpreters were started at that moment?

 

You can find all started interpreters in linux command line with:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | wc -l

 

And spark started interpreters:

ps -ef | grep ZeppelinServer | grep -v grep | awk "NR==1" | awk -F' ' '{print $2}' | xargs ps -f --ppid | grep spark | wc -l

 

 


Максим Белоусов
Архитектор

Отдел отчетности и витрин данных

Управление хранилищ данных и отчетности
Тел.: <a href="tel:+7%20495%20648-10-00" value="+74956481000" target="_blank">+7 495 648-10-00, доб. 2271

 

From: Paul Brenner [mailto:[hidden email]]
Sent: Thursday, October 12, 2017 7:45 PM
To: Geoffrey Cheng <[hidden email]>; [hidden email]
Subject: Re: Zeppelin Stops Loading Notes

 

Does this issue need a Jira ticket? The problem is that I have no idea how to reproduce and I’m not sure if there is anything in the logs that is relevant.

 

Any ideas how we can produce an actionable Jira ticket out of this?

Paul Brenner

DATA SCIENTIST

<a href="tel:(217)%20390-3033" value="+12173903033" target="_blank">(217) 390-3033  


PlaceIQ:Landmark by PlaceIQ

On Thu, Oct 12, 2017 at 8:31 AM Geoffrey Cheng <[hidden email]> wrote:

we have the same issue.  usually when multiple ppl using it, only header loads. 

 

we tried couldn't find solution, so we restart every single time.   in fact ,  we have to restart daily at least.  

 

On Oct 12, 2017 2:40 AM, "Fabian Böhnlein" <[hidden email]> wrote:

Hi Paul, Ben,

 

we also see this happen regularly. It's more likely to happen when a handful of people are using it.

 

We mostly run one spark interpreter per person. We also don't observe anything in the logs. The 'header' that you mentioned is actually still in the cache.

 

Sometimes it's specific notes that don't load.

Sometimes there's a hanging Spark interpreter, once it's killed notes load again.

 

We're pretty clueless about it.

 

Any front-end related logs we could enable to find out more?

 

On Sat, 19 Aug 2017 at 20:19 Ben Vogan <[hidden email]> wrote:

I have seen Zeppelin get into this state once.  I restarted it without investigating the logs however so I don't have anything useful to go on as to why.

 

--Ben

 

On Sat, Aug 19, 2017 at 8:17 AM, Paul Brenner <[hidden email]> wrote:

You were correct. We had "export ZEPPELIN_SSL_PORT=false” in our zeppelin-env.sh. I’m going to comment that out. I suspect it is actually unrelated to the behavior we are seeing where pages stop loading though. Anyone else see this happen? 

 

I’ll report back if that happens again after the fix.