z.load() Must be used before SparkInterpreter (%spark) initialized?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

z.load() Must be used before SparkInterpreter (%spark) initialized?

Richard Xin
I used %dep
z.load("path/to/jar")
I got following error:
Must be used before SparkInterpreter (%spark) initialized
Hint: put this paragraph before any Spark code and restart Zeppelin/Interpreter

restart zeppelin did make it work, it seems to be an expected behavior, but I don't understand thee reason behind it. If every time I have to restart zeppelin before I could dynamically add an external jar, then this feature is useless to most people.

Richard Xin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: z.load() Must be used before SparkInterpreter (%spark) initialized?

Jeff Zhang

It is not restarting zeppelin, you just need to restart spark interpreter.
 

Richard Xin <[hidden email]>于2017年7月26日周三 上午12:53写道:
I used %dep
z.load("path/to/jar")
I got following error:
Must be used before SparkInterpreter (%spark) initialized
Hint: put this paragraph before any Spark code and restart Zeppelin/Interpreter

restart zeppelin did make it work, it seems to be an expected behavior, but I don't understand thee reason behind it. If every time I have to restart zeppelin before I could dynamically add an external jar, then this feature is useless to most people.

Richard Xin
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: z.load() Must be used before SparkInterpreter (%spark) initialized?

Davidson, Jonathan

We’ve also found it undesirable being unable to load extra jars without restarting the interpreter. Is the best way to mitigate this by running in isolated mode (by note or user), so that other users are less affected? Is there any development in progress to load without restart?

 

Thanks!

 

From: Jeff Zhang [mailto:[hidden email]]
Sent: Tuesday, July 25, 2017 8:31 PM
To: Users <[hidden email]>
Subject: Re: z.load() Must be used before SparkInterpreter (%spark) initialized?

 

 

It is not restarting zeppelin, you just need to restart spark interpreter.

 

 

Richard Xin <[hidden email]>2017726日周三 上午12:53写道:

I used %dep

z.load("path/to/jar")

I got following error:

Must be used before SparkInterpreter (%spark) initialized

Hint: put this paragraph before any Spark code and restart Zeppelin/Interpreter

 

restart zeppelin did make it work, it seems to be an expected behavior, but I don't understand thee reason behind it. If every time I have to restart zeppelin before I could dynamically add an external jar, then this feature is useless to most people.

 

Richard Xin


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: z.load() Must be used before SparkInterpreter (%spark) initialized?

Rick Moritz
Please allow me to opinionate on that subject:

To me, there are two options: Indeed you either run the Spark interpreter in isolated mode, or you have dedicated Spark Interpreter-Groups per organziational unit, so you can manage dependencies independently.
Obviously, there's no way around restarting the interpreter, when you need to tell the classloader of additional Jars in the classloader, never mind distributing those jars across the cluster without calling spark-submit. Since an interpreter represents an actual running JVM, you need to treat it as such. I assume that is also the reason, why z.load has been superseded by dependency configuration in the Interpreter settings.

A good way to manage dependencies is to collate all the dependencies per unit in a fat-jar, and manage that via an external build. That way you can have testable dependencies, and a curated experience, where everything just works -- as long as someone puts that effort in. Still, with a collaborative tool, that's better than everyone putting in their favorite lib, and then causing each interpreter start to pull in half the Internet in transitive dependencies, with potential conflicts to boot. Zeppelin will be slowish, if every interpreter start starts off with uploading a GB of dependencies into the cluster.

In an ad hoc, almost-single-user environment, you can work well with Zeppelin's built-in dependency management, but I don't really see it scale to the enterprise level -- and I don't think it should either. There's no point in investing ressources into something, that external tools can already easily provide.

I wouldn't deploy Zeppelin as enterprise infrastructure either - deploy one Zeppelin per project. and manage segregation there by separate interpreters. This also helps with finer ressource management.

I hope this helps your understanding, as well as giving you some pointers on how to manage Zeppelin in such a way, that there are less conflicts between users.

On Wed, Jul 26, 2017 at 2:30 PM, Davidson, Jonathan <[hidden email]> wrote:

We’ve also found it undesirable being unable to load extra jars without restarting the interpreter. Is the best way to mitigate this by running in isolated mode (by note or user), so that other users are less affected? Is there any development in progress to load without restart?

 

Thanks!

 

From: Jeff Zhang [mailto:[hidden email]]
Sent: Tuesday, July 25, 2017 8:31 PM
To: Users <[hidden email]>
Subject: Re: z.load() Must be used before SparkInterpreter (%spark) initialized?

 

 

It is not restarting zeppelin, you just need to restart spark interpreter.

 

 

Richard Xin <[hidden email]>2017726日周三 上午12:53写道:

I used %dep

z.load("path/to/jar")

I got following error:

Must be used before SparkInterpreter (%spark) initialized

Hint: put this paragraph before any Spark code and restart Zeppelin/Interpreter

 

restart zeppelin did make it work, it seems to be an expected behavior, but I don't understand thee reason behind it. If every time I have to restart zeppelin before I could dynamically add an external jar, then this feature is useless to most people.

 

Richard Xin


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: z.load() Must be used before SparkInterpreter (%spark) initialized?

Jeff Zhang

Thanks Rick for detailed explanation. It should be very helpful for users.

Personally I would suggest user to set additional jars in interpreter setting instead of using %dep. For the long term solution, I am considering to put configuration into note itself. e.g. for each interpreter,  there would be one special interpreter to initialize the interpreter setting (to override the default interpreter setting of zeppelin).  This special interpreter could be called something like %spark.init

And user need to put this as the first paragraph of note before running any other paragraph. The purpose is that to not only include the code but also include the configuration into note. so that user can rerun the note in other zeppelin instances. Just make note as a self-contained concept without any external dependencies.  




Rick Moritz <[hidden email]>于2017年7月27日周四 上午12:52写道:
Please allow me to opinionate on that subject:

To me, there are two options: Indeed you either run the Spark interpreter in isolated mode, or you have dedicated Spark Interpreter-Groups per organziational unit, so you can manage dependencies independently.
Obviously, there's no way around restarting the interpreter, when you need to tell the classloader of additional Jars in the classloader, never mind distributing those jars across the cluster without calling spark-submit. Since an interpreter represents an actual running JVM, you need to treat it as such. I assume that is also the reason, why z.load has been superseded by dependency configuration in the Interpreter settings.

A good way to manage dependencies is to collate all the dependencies per unit in a fat-jar, and manage that via an external build. That way you can have testable dependencies, and a curated experience, where everything just works -- as long as someone puts that effort in. Still, with a collaborative tool, that's better than everyone putting in their favorite lib, and then causing each interpreter start to pull in half the Internet in transitive dependencies, with potential conflicts to boot. Zeppelin will be slowish, if every interpreter start starts off with uploading a GB of dependencies into the cluster.

In an ad hoc, almost-single-user environment, you can work well with Zeppelin's built-in dependency management, but I don't really see it scale to the enterprise level -- and I don't think it should either. There's no point in investing ressources into something, that external tools can already easily provide.

I wouldn't deploy Zeppelin as enterprise infrastructure either - deploy one Zeppelin per project. and manage segregation there by separate interpreters. This also helps with finer ressource management.

I hope this helps your understanding, as well as giving you some pointers on how to manage Zeppelin in such a way, that there are less conflicts between users.

On Wed, Jul 26, 2017 at 2:30 PM, Davidson, Jonathan <[hidden email]> wrote:

We’ve also found it undesirable being unable to load extra jars without restarting the interpreter. Is the best way to mitigate this by running in isolated mode (by note or user), so that other users are less affected? Is there any development in progress to load without restart?

 

Thanks!

 

From: Jeff Zhang [mailto:[hidden email]]
Sent: Tuesday, July 25, 2017 8:31 PM
To: Users <[hidden email]>
Subject: Re: z.load() Must be used before SparkInterpreter (%spark) initialized?

 

 

It is not restarting zeppelin, you just need to restart spark interpreter.

 

 

Richard Xin <[hidden email]>2017726日周三 上午12:53写道:

I used %dep

z.load("path/to/jar")

I got following error:

Must be used before SparkInterpreter (%spark) initialized

Hint: put this paragraph before any Spark code and restart Zeppelin/Interpreter

 

restart zeppelin did make it work, it seems to be an expected behavior, but I don't understand thee reason behind it. If every time I have to restart zeppelin before I could dynamically add an external jar, then this feature is useless to most people.

 

Richard Xin


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.


Loading...