Class not found exception : spark on zeppelin

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Class not found exception : spark on zeppelin

Udit Mehta
Hi,

I am running spark on zeppeling and trying to create some temp tables to run sql queries on.
I have json data on hdfs which I am trying to load as a jsonRdd.
Here are my commands:

val data=sc.sequenceFile("/user/ds=01-02-2015/hour=2/*", classOf[Null], classOf[org.apache.hadoop.io.Text]).map{case (k,v) => v.toString()}

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val recordsJson = sqlContext.jsonRDD(data)

And here is the error i get which clearly shows its failing on the json rdd step:

data: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[11] at map at <console>:26 import org.apache.spark.sql.SQLContext sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@313547c4 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, gdoop-worker31.snc1): java.lang.ClassNotFoundException: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1 at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:278)  

I built zeppelin using:
mvn clean package -DskipTests -Pspark-1.3 -Phadoop-2.6 -Dhadoop.version=2.6.0 -Pyarn
mvn clean package -P build-distr -DskipTests

Lastly here are my configs:

Interpreter.json(spark section):
      "id": "2ARHCUUUZ",
      "name": "spark",
      "group": "spark",
      "properties": {
        "spark.executor.memory": "512m",
        "args": "",
        "spark.yarn.jar": "hdfs://namenode-vip.snc1:8020/spark/spark-assembly-1.3.1-hadoop2.6.0.jar",
        "spark.cores.max": "",
        "zeppelin.spark.concurrentSQL": "false",
        "zeppelin.spark.useHiveContext": "true",
        "zeppelin.pyspark.python": "python",
        "zeppelin.dep.localrepo": "local-repo",
        "spark.home": "/usr/local/lib/spark-1.3",
        "spark.yarn.am.extraJavaOptions": "-Dhdp.version\u003d2.2.0.0-2041",
        "zeppelin.spark.maxResult": "1000",
        "master": "yarn-client",
        "spark.yarn.queue": "public",
        "spark.yarn.access.namenodes": "hdfs://namenode1.snc1:8032,hdfs://namenode2.snc1:8032",
        "spark.scheduler.mode": "FAIR",
        "spark.dynamicAllocation.enabled": "false",
        "spark.executor.extraLibraryPath": "/usr/lib/hadoop/lib/native/Linux-amd64-64",
        "spark.executor.extraJavaOptions": "-Dhdp.version\u003d2.2.0.0-2041",
        "spark.app.name": "Zeppelin",
        "spark.driver.extraLibraryPath": "/usr/lib/hadoop/lib/native/Linux-amd64-64",
        "spark.driver.extraJavaOptions": "-Dhdp.version\u003d2.2.0.0-2041"
      }

zeppelin-env.sh
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_CLASSPATH=/usr/lib/hadoop/lib/*:/usr/lib/hadoop/lib/native/Linux-amd64-64
export ZEPPELIN_PORT=10020
export ZEPPELIN_JAVA_OPTS="-Dhdp.version=2.2.0.0-2041 -Dspark.jars=/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.2.2.0.0-2041.jar"

Would anyone be able to help with the problem?

Thanks in advance,
Udit