Setting spark config properties in Zeppelin 0.7.2

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Setting spark config properties in Zeppelin 0.7.2

Jung, Soonoh
Hi zeppelin users,

I have an issue when I upgraded zeppelin to 0.7.2 from 0.6.2

I am using spark-redis 0.3.2 library to load redis values.

To use that library, I have to set "redis.host" property on spark config instance
It used to work on zeppelin 0.6.2 but not in 0.7.2. 

How can I set spark config property in zeppelin 0.7.2?

The interpreter property setting:
Properties
namevalue
args
masteryarn-client
redis.hostxxx.cache.amazonaws.com
redis.timeout60000

A sample test note:

-----------
%spark

import com.redislabs.provider.redis._

sc.getConf.getAll.foreach(println)

val rdd = sc.fromRedisKV("test_key")
----------


I expect "(redis.host,xxx.cache.amazonaws.com)" in the output but it does not.
output:

--------------
(spark.eventLog.enabled,true)
(spark.submit.pyArchives,pyspark.zip:py4j-0.9-src.zip:py4j-0.8.2.1-src.zip:py4j-0.10.1-src.zip:py4j-0.10.3-src.zip:py4j-0.10.4-src.zip)
(spark.network.timeout,300s)
(spark.executor.instances,15)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.app.id,application_1504247249188_0005)
(spark.executor.memory,5g)
(spark.driver.memory,5g)
(spark.executor.cores,4)
(spark.submit.pyFiles,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://ip-172-31-2-15.ap-northeast-1.compute.internal:20888/proxy/application_1504247249188_0005)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.jars,file:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
(spark.repl.class.outputDir,/mnt/tmp/spark-fa28d8f7-d675-4181-99d2-1bd6ef67db5c)
(spark.submit.deployMode,client)
(spark.yarn.dist.archives,/usr/lib/spark/R/lib/sparkr.zip#sparkr)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
(spark.driver.extraJavaOptions, -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///etc/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-zeppelin-ip-172-31-2-15.log)
(spark.app.name,am-zeppelin-segmentation-prod)
(spark.driver.port,34316)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,ip-172-31-2-15.ap-northeast-1.compute.internal)
(spark.history.ui.port,18080)
(spark.sql.catalogImplementation,in-memory)
(spark.home,/usr/lib/spark)
(master,yarn)
(spark.shuffle.service.enabled,true)
(spark.master,yarn-client)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.scheduler.mode,FAIR)
(spark.dynamicAllocation.cachedExecutorIdleTimeout ,1200s)
(spark.dynamicAllocation.executorIdleTimeout,30s)
(spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.executorEnv.PYTHONPATH,/usr/lib/spark/python/lib/py4j-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-src.zip)
(spark.yarn.historyServer.address,ip-172-31-2-15.ap-northeast-1.compute.internal:18080)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.yarn.isPython,true)
(spark.dynamicAllocation.enabled,true)
(spark.driver.host,172.31.2.15)
(spark.repl.class.uri,spark://172.31.2.15:34316/classes)
(spark.driver.extraClassPath,:/usr/lib/zeppelin/local-repo/2CRQYKAQF/*:/usr/lib/zeppelin/interpreter/spark/*:/usr/lib/zeppelin/lib/interpreter/*::/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
...
-------------


Best regards,
Soonoh

Reply | Threaded
Open this post in threaded view
|

Re: Setting spark config properties in Zeppelin 0.7.2

Jeff Zhang

It is due to only spark.* is accepted as spark properties, it seems there's still some libraaries using non spark.*.

I created https://issues.apache.org/jira/browse/ZEPPELIN-2893 for that. And will fix it in 0.7.3

Jung, Soonoh <[hidden email]>于2017年9月1日周五 下午5:33写道:
Hi zeppelin users,

I have an issue when I upgraded zeppelin to 0.7.2 from 0.6.2

I am using spark-redis 0.3.2 library to load redis values.

To use that library, I have to set "redis.host" property on spark config instance
It used to work on zeppelin 0.6.2 but not in 0.7.2. 

How can I set spark config property in zeppelin 0.7.2?

The interpreter property setting:
Properties
namevalue
args
masteryarn-client
redis.hostxxx.cache.amazonaws.com
redis.timeout60000

A sample test note:

-----------
%spark

import com.redislabs.provider.redis._

sc.getConf.getAll.foreach(println)

val rdd = sc.fromRedisKV("test_key")
----------


I expect "(redis.host,xxx.cache.amazonaws.com)" in the output but it does not.
output:

--------------
(spark.eventLog.enabled,true)
(spark.submit.pyArchives,pyspark.zip:py4j-0.9-src.zip:py4j-0.8.2.1-src.zip:py4j-0.10.1-src.zip:py4j-0.10.3-src.zip:py4j-0.10.4-src.zip)
(spark.network.timeout,300s)
(spark.executor.instances,15)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.app.id,application_1504247249188_0005)
(spark.executor.memory,5g)
(spark.driver.memory,5g)
(spark.executor.cores,4)
(spark.submit.pyFiles,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://ip-172-31-2-15.ap-northeast-1.compute.internal:20888/proxy/application_1504247249188_0005)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.jars,file:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
(spark.repl.class.outputDir,/mnt/tmp/spark-fa28d8f7-d675-4181-99d2-1bd6ef67db5c)
(spark.submit.deployMode,client)
(spark.yarn.dist.archives,/usr/lib/spark/R/lib/sparkr.zip#sparkr)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
(spark.driver.extraJavaOptions, -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///etc/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-zeppelin-ip-172-31-2-15.log)
(spark.app.name,am-zeppelin-segmentation-prod)
(spark.driver.port,34316)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,ip-172-31-2-15.ap-northeast-1.compute.internal)
(spark.history.ui.port,18080)
(spark.sql.catalogImplementation,in-memory)
(spark.home,/usr/lib/spark)
(master,yarn)
(spark.shuffle.service.enabled,true)
(spark.master,yarn-client)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.scheduler.mode,FAIR)
(spark.dynamicAllocation.cachedExecutorIdleTimeout ,1200s)
(spark.dynamicAllocation.executorIdleTimeout,30s)
(spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.executorEnv.PYTHONPATH,/usr/lib/spark/python/lib/py4j-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-src.zip)
(spark.yarn.historyServer.address,ip-172-31-2-15.ap-northeast-1.compute.internal:18080)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.yarn.isPython,true)
(spark.dynamicAllocation.enabled,true)
(spark.driver.host,172.31.2.15)
(spark.repl.class.uri,spark://172.31.2.15:34316/classes)
(spark.driver.extraClassPath,:/usr/lib/zeppelin/local-repo/2CRQYKAQF/*:/usr/lib/zeppelin/interpreter/spark/*:/usr/lib/zeppelin/lib/interpreter/*::/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
...
-------------


Best regards,
Soonoh

Reply | Threaded
Open this post in threaded view
|

Re: Setting spark config properties in Zeppelin 0.7.2

Jung, Soonoh
Thank you for creating an issue and future fix.

Regards,
Soonoh

On 1 September 2017 at 18:50, Jeff Zhang <[hidden email]> wrote:

It is due to only spark.* is accepted as spark properties, it seems there's still some libraaries using non spark.*.

I created https://issues.apache.org/jira/browse/ZEPPELIN-2893 for that. And will fix it in 0.7.3

Jung, Soonoh <[hidden email]>于2017年9月1日周五 下午5:33写道:
Hi zeppelin users,

I have an issue when I upgraded zeppelin to 0.7.2 from 0.6.2

I am using spark-redis 0.3.2 library to load redis values.

To use that library, I have to set "redis.host" property on spark config instance
It used to work on zeppelin 0.6.2 but not in 0.7.2. 

How can I set spark config property in zeppelin 0.7.2?

The interpreter property setting:
Properties
namevalue
args
masteryarn-client
redis.hostxxx.cache.amazonaws.com
redis.timeout60000

A sample test note:

-----------
%spark

import com.redislabs.provider.redis._

sc.getConf.getAll.foreach(println)

val rdd = sc.fromRedisKV("test_key")
----------


I expect "(redis.host,xxx.cache.amazonaws.com)" in the output but it does not.
output:

--------------
(spark.eventLog.enabled,true)
(spark.submit.pyArchives,pyspark.zip:py4j-0.9-src.zip:py4j-0.8.2.1-src.zip:py4j-0.10.1-src.zip:py4j-0.10.3-src.zip:py4j-0.10.4-src.zip)
(spark.network.timeout,300s)
(spark.executor.instances,15)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.app.id,application_1504247249188_0005)
(spark.executor.memory,5g)
(spark.driver.memory,5g)
(spark.executor.cores,4)
(spark.submit.pyFiles,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.serializer,org.apache.spark.serializer.KryoSerializer)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES,http://ip-172-31-2-15.ap-northeast-1.compute.internal:20888/proxy/application_1504247249188_0005)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.sql.warehouse.dir,hdfs:///user/spark/warehouse)
(spark.jars,file:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
(spark.repl.class.outputDir,/mnt/tmp/spark-fa28d8f7-d675-4181-99d2-1bd6ef67db5c)
(spark.submit.deployMode,client)
(spark.yarn.dist.archives,/usr/lib/spark/R/lib/sparkr.zip#sparkr)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.ui.filters,org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter)
(spark.driver.extraJavaOptions, -Dfile.encoding=UTF-8 -Dlog4j.configuration=file:///etc/zeppelin/conf/log4j.properties -Dzeppelin.log.file=/var/log/zeppelin/zeppelin-interpreter-spark-zeppelin-ip-172-31-2-15.log)
(spark.app.name,am-zeppelin-segmentation-prod)
(spark.driver.port,34316)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar)
(spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS,ip-172-31-2-15.ap-northeast-1.compute.internal)
(spark.history.ui.port,18080)
(spark.sql.catalogImplementation,in-memory)
(spark.home,/usr/lib/spark)
(master,yarn)
(spark.shuffle.service.enabled,true)
(spark.master,yarn-client)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.scheduler.mode,FAIR)
(spark.dynamicAllocation.cachedExecutorIdleTimeout ,1200s)
(spark.dynamicAllocation.executorIdleTimeout,30s)
(spark.yarn.dist.files,file:/etc/spark/conf/hive-site.xml,file:/usr/lib/spark/python/lib/pyspark.zip,file:/usr/lib/spark/python/lib/py4j-0.10.4-src.zip)
(spark.executorEnv.PYTHONPATH,/usr/lib/spark/python/lib/py4j-src.zip:/usr/lib/spark/python/:<CPS>{{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-src.zip)
(spark.yarn.historyServer.address,ip-172-31-2-15.ap-northeast-1.compute.internal:18080)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.yarn.isPython,true)
(spark.dynamicAllocation.enabled,true)
(spark.driver.host,172.31.2.15)
(spark.repl.class.uri,spark://172.31.2.15:34316/classes)
(spark.driver.extraClassPath,:/usr/lib/zeppelin/local-repo/2CRQYKAQF/*:/usr/lib/zeppelin/interpreter/spark/*:/usr/lib/zeppelin/lib/interpreter/*::/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/lib/zeppelin/interpreter/spark/zeppelin-spark_2.10-0.7.2.jar)
...
-------------


Best regards,
Soonoh