ZEPPELIN_JAVA_OPTS not working?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel


Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

moon
Administrator
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel


Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel



Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel




Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

moon
Administrator
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel




Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel





Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

moon
Administrator
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel





Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
Also filed a bug on apache-spark:




On Thu, Sep 3, 2015 at 8:59 AM, moon soo Lee <[hidden email]> wrote:
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel






Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

moon
Administrator
I have pushed few commits to make spark.files work for pyspark.
Please check https://github.com/apache/incubator-zeppelin/pull/270 and let me know if it helps.

Thanks,
moon

On Thu, Sep 3, 2015 at 3:25 PM Axel Dahl <[hidden email]> wrote:
Also filed a bug on apache-spark:




On Thu, Sep 3, 2015 at 8:59 AM, moon soo Lee <[hidden email]> wrote:
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel






Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
I applied PR-270 and I do see the --files being populated but still they don't appear in the folder, here's two screenshots (I'm adding /home/hduser/lib/sparklib.zip)


Here's the command in on the zeppelin environment.


But further down the page I only see these three entries

Was going to try and submit the command above manually and see if I can figure out why it doesn't work.

On Thu, Sep 3, 2015 at 6:57 PM, moon soo Lee <[hidden email]> wrote:
I have pushed few commits to make spark.files work for pyspark.
Please check https://github.com/apache/incubator-zeppelin/pull/270 and let me know if it helps.

Thanks,
moon

On Thu, Sep 3, 2015 at 3:25 PM Axel Dahl <[hidden email]> wrote:
Also filed a bug on apache-spark:




On Thu, Sep 3, 2015 at 8:59 AM, moon soo Lee <[hidden email]> wrote:
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel







Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

moon
Administrator
I have added 
spark.files /tmp/my.py 
in SPARK_HOME/conf/spark.defaults.conf
pasted1

and i can see my.py in classpath entries

pasted2
I have tested it with PR-270 and in yarn-client mode.
The only configuration i have for Zeppelin is

export SPARK_HOME=/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.6

in conf/zeppelin-env.sh and "yarn-client" in interpreter setting page.

Do you see any difference with your configuration?

Thanks,
moon

On Fri, Sep 4, 2015 at 7:29 AM Axel Dahl <[hidden email]> wrote:
I applied PR-270 and I do see the --files being populated but still they don't appear in the folder, here's two screenshots (I'm adding /home/hduser/lib/sparklib.zip)


Here's the command in on the zeppelin environment.
Screen Shot 2015-09-04 at 7.23.44 AM.png

But further down the page I only see these three entries
Screen Shot 2015-09-04 at 7.24.14 AM.png
Was going to try and submit the command above manually and see if I can figure out why it doesn't work.

On Thu, Sep 3, 2015 at 6:57 PM, moon soo Lee <[hidden email]> wrote:
I have pushed few commits to make spark.files work for pyspark.
Please check https://github.com/apache/incubator-zeppelin/pull/270 and let me know if it helps.

Thanks,
moon

On Thu, Sep 3, 2015 at 3:25 PM Axel Dahl <[hidden email]> wrote:
Also filed a bug on apache-spark:




On Thu, Sep 3, 2015 at 8:59 AM, moon soo Lee <[hidden email]> wrote:
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel







Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
hmm, still not seeing any evidence that zeppelin is reading the conf/spark-defaults.conf file.

I'm running spark1.4, hadoop2.3 binary in standalone mode.

I just:

1. downloaded and compiled latest from master
2. configured SPARK_HOME in conf/zeppelin-env.sh
3. configured the master in the interpreter settings
4. then in a notebook I execute the following:

from pyspark import SparkFiles
import os
for p in os.listdir(SparkFiles.getRootDirectory()):
    print "FOUND: " + str(p)

so that I can launch the interpreter and see if the files got copied.  I also check Spark's env to see if the files were added there.

No luck so far :(



On Mon, Sep 7, 2015 at 9:17 AM, moon soo Lee <[hidden email]> wrote:
I have added 
spark.files /tmp/my.py 
in SPARK_HOME/conf/spark.defaults.conf
pasted1

and i can see my.py in classpath entries

pasted2
I have tested it with PR-270 and in yarn-client mode.
The only configuration i have for Zeppelin is

export SPARK_HOME=/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.6

in conf/zeppelin-env.sh and "yarn-client" in interpreter setting page.

Do you see any difference with your configuration?

Thanks,
moon

On Fri, Sep 4, 2015 at 7:29 AM Axel Dahl <[hidden email]> wrote:
I applied PR-270 and I do see the --files being populated but still they don't appear in the folder, here's two screenshots (I'm adding /home/hduser/lib/sparklib.zip)


Here's the command in on the zeppelin environment.
Screen Shot 2015-09-04 at 7.23.44 AM.png

But further down the page I only see these three entries
Screen Shot 2015-09-04 at 7.24.14 AM.png
Was going to try and submit the command above manually and see if I can figure out why it doesn't work.

On Thu, Sep 3, 2015 at 6:57 PM, moon soo Lee <[hidden email]> wrote:
I have pushed few commits to make spark.files work for pyspark.
Please check https://github.com/apache/incubator-zeppelin/pull/270 and let me know if it helps.

Thanks,
moon

On Thu, Sep 3, 2015 at 3:25 PM Axel Dahl <[hidden email]> wrote:
Also filed a bug on apache-spark:




On Thu, Sep 3, 2015 at 8:59 AM, moon soo Lee <[hidden email]> wrote:
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel








Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

moon
Administrator
I have tested with spark 1.4, hadoop2.6 binary in standalone mode, with 

1. conf/spark-defaults.conf

spark.master spark://masterhost:7077
spark.files /tmp/my.py

2. SPARK_HOME in conf/zeppelin-env.sh
3. configure the master in the interpreter setting
4. execute code in the notebook 

from pyspark import SparkFiles
import os
for p in os.listdir(SparkFiles.getRootDirectory()):
    print "FOUND: " + str(p)

I get result

FOUND: my.py FOUND: my.pyc

Is there any other way to reproduce the error?

Thanks,
moon


On Tue, Sep 8, 2015 at 9:37 AM Axel Dahl <[hidden email]> wrote:
hmm, still not seeing any evidence that zeppelin is reading the conf/spark-defaults.conf file.

I'm running spark1.4, hadoop2.3 binary in standalone mode.

I just:

1. downloaded and compiled latest from master
2. configured SPARK_HOME in conf/zeppelin-env.sh
3. configured the master in the interpreter settings
4. then in a notebook I execute the following:

from pyspark import SparkFiles
import os
for p in os.listdir(SparkFiles.getRootDirectory()):
    print "FOUND: " + str(p)

so that I can launch the interpreter and see if the files got copied.  I also check Spark's env to see if the files were added there.

No luck so far :(



On Mon, Sep 7, 2015 at 9:17 AM, moon soo Lee <[hidden email]> wrote:
I have added 
spark.files /tmp/my.py 
in SPARK_HOME/conf/spark.defaults.conf
pasted1

and i can see my.py in classpath entries

pasted2
I have tested it with PR-270 and in yarn-client mode.
The only configuration i have for Zeppelin is

export SPARK_HOME=/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.6

in conf/zeppelin-env.sh and "yarn-client" in interpreter setting page.

Do you see any difference with your configuration?

Thanks,
moon

On Fri, Sep 4, 2015 at 7:29 AM Axel Dahl <[hidden email]> wrote:
I applied PR-270 and I do see the --files being populated but still they don't appear in the folder, here's two screenshots (I'm adding /home/hduser/lib/sparklib.zip)


Here's the command in on the zeppelin environment.
Screen Shot 2015-09-04 at 7.23.44 AM.png

But further down the page I only see these three entries
Screen Shot 2015-09-04 at 7.24.14 AM.png
Was going to try and submit the command above manually and see if I can figure out why it doesn't work.

On Thu, Sep 3, 2015 at 6:57 PM, moon soo Lee <[hidden email]> wrote:
I have pushed few commits to make spark.files work for pyspark.
Please check https://github.com/apache/incubator-zeppelin/pull/270 and let me know if it helps.

Thanks,
moon

On Thu, Sep 3, 2015 at 3:25 PM Axel Dahl <[hidden email]> wrote:
Also filed a bug on apache-spark:




On Thu, Sep 3, 2015 at 8:59 AM, moon soo Lee <[hidden email]> wrote:
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel








Reply | Threaded
Open this post in threaded view
|

Re: ZEPPELIN_JAVA_OPTS not working?

Axel Dahl
apparently not.

I just pulled form latest source and it's working now. Hmmm.  Thanks for all your help with this moon.  Very much appreciated.

-Axel

On Tue, Sep 15, 2015 at 9:20 PM, moon soo Lee <[hidden email]> wrote:
I have tested with spark 1.4, hadoop2.6 binary in standalone mode, with 

1. conf/spark-defaults.conf

spark.master spark://masterhost:7077
spark.files /tmp/my.py

2. SPARK_HOME in conf/zeppelin-env.sh
3. configure the master in the interpreter setting
4. execute code in the notebook 

from pyspark import SparkFiles
import os
for p in os.listdir(SparkFiles.getRootDirectory()):
    print "FOUND: " + str(p)

I get result

FOUND: my.py FOUND: my.pyc

Is there any other way to reproduce the error?

Thanks,
moon


On Tue, Sep 8, 2015 at 9:37 AM Axel Dahl <[hidden email]> wrote:
hmm, still not seeing any evidence that zeppelin is reading the conf/spark-defaults.conf file.

I'm running spark1.4, hadoop2.3 binary in standalone mode.

I just:

1. downloaded and compiled latest from master
2. configured SPARK_HOME in conf/zeppelin-env.sh
3. configured the master in the interpreter settings
4. then in a notebook I execute the following:

from pyspark import SparkFiles
import os
for p in os.listdir(SparkFiles.getRootDirectory()):
    print "FOUND: " + str(p)

so that I can launch the interpreter and see if the files got copied.  I also check Spark's env to see if the files were added there.

No luck so far :(



On Mon, Sep 7, 2015 at 9:17 AM, moon soo Lee <[hidden email]> wrote:
I have added 
spark.files /tmp/my.py 
in SPARK_HOME/conf/spark.defaults.conf
pasted1

and i can see my.py in classpath entries

pasted2
I have tested it with PR-270 and in yarn-client mode.
The only configuration i have for Zeppelin is

export SPARK_HOME=/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.6

in conf/zeppelin-env.sh and "yarn-client" in interpreter setting page.

Do you see any difference with your configuration?

Thanks,
moon

On Fri, Sep 4, 2015 at 7:29 AM Axel Dahl <[hidden email]> wrote:
I applied PR-270 and I do see the --files being populated but still they don't appear in the folder, here's two screenshots (I'm adding /home/hduser/lib/sparklib.zip)


Here's the command in on the zeppelin environment.
Screen Shot 2015-09-04 at 7.23.44 AM.png

But further down the page I only see these three entries
Screen Shot 2015-09-04 at 7.24.14 AM.png
Was going to try and submit the command above manually and see if I can figure out why it doesn't work.

On Thu, Sep 3, 2015 at 6:57 PM, moon soo Lee <[hidden email]> wrote:
I have pushed few commits to make spark.files work for pyspark.
Please check https://github.com/apache/incubator-zeppelin/pull/270 and let me know if it helps.

Thanks,
moon

On Thu, Sep 3, 2015 at 3:25 PM Axel Dahl <[hidden email]> wrote:
Also filed a bug on apache-spark:




On Thu, Sep 3, 2015 at 8:59 AM, moon soo Lee <[hidden email]> wrote:
Let me investigate more..

On Thu, Sep 3, 2015 at 12:13 AM Axel Dahl <[hidden email]> wrote:
doesn't seem to have any effect. :/  Nothing in the logs to indicate it was added and getting the same error when trying to load it.

On Wed, Sep 2, 2015 at 11:16 PM, moon soo Lee <[hidden email]> wrote:
maybe then you can try this option

export SPARK_SUBMIT_OPTIONS="--py-files [comma separated list of .zip, .egg or .py]"

in conf/zeppelin-env.sh

Thanks,
moon

On Wed, Sep 2, 2015 at 10:53 PM Axel Dahl <[hidden email]> wrote:
So it seems that, when you call, say:

spark-submit xyz.py

it converts xyz.py into the option "spark.files   xyz.py" and because "xyz.py" was entered on the command line, it overwrote the "spark.files" entry that's in the "spark-defaults.conf".

Is there another way to add py-files via spark-defaults.conf or another way to configure zeppelin to always add a set of configured files to the spark-submit job?

-Axel


On Wed, Sep 2, 2015 at 9:42 PM, Axel Dahl <[hidden email]> wrote:
Thanks moon,

I set spark.files in SPARK_HOME/conf/spark-defaults.conf

and when I run spark/bin/pyspark shell it finds and adds these files, but when I execute /bin/pyspark/spark-submit it doesn't add them. spark-submit does read the spark-defaults.conf (because it does find the spark.master entry), but for some reason ignores the spark.files directive.....very strange since pyspark shell loads them properly.



On Tue, Sep 1, 2015 at 11:25 PM, moon soo Lee <[hidden email]> wrote:
Hi,


https://github.com/apache/incubator-zeppelin/pull/270 is not yet merged, but i suggest try this. It uses spark-submit if you have SPARK_HOME defined. You'll just need define your spark.files in SPARK_HOME/conf/spark-defaults.conf, without adding them into ZEPPELIN_JAVA_OPTS

Thanks,
moon


On Tue, Sep 1, 2015 at 10:52 PM Axel Dahl <[hidden email]> wrote:
Downloaded and compiled latest zeppelin.

in my conf/zeppelin-env.sh file I have the following line:

export ZEPPELIN_JAVA_OPTS="-Dspark.files=/home/hduser/lib/sparklib.zip,/home/hduser/lib/service.cfg,/home/hduser/lib/helper.py"

This used to work, but when I inspect the folder using SparkFile.getRootDirectory(), it doesn't show any of the files in the folder.

I have checked that all the files are accessible at the specified paths.  There's nothing in the logs to indicate that  "ZEPPELIN_JAVA_OPTS" was read, but it looks like other entries are being read (e.g. SPARK_HOME).

Did this change from previous versions?

-Axel