Need help with remote YARN cluster setup

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Need help with remote YARN cluster setup

Litt, Shaun

So I am trying to get Zeppelin running against our YARN cluster and it doesn’t seem to actually be submitting the SPARK jobs to YARN (they never show up in the UI or logs).

 

My current ENV settings are:

export MASTER=yarn-client                 # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.

export ZEPPELIN_JAVA_OPTS="-Dspark.shuffle.service.enabled=true -Dspark.shuffle.service.port=7337 -Dspark.shuffle.consolidateFiles=true -Dspark.akka.askTimeout=60 -Dspark.akka.frameSize=500 -Dspark.executor.memory=8g -Dspark.cores.max=48 -Dspark.yarn.queue=root.heds.dw.dev -Dspark.serializer=org.apache.spark.serializer.KryoSerializer"      # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"

export HADOOP_CONF_DIR=/etc/hadoop/conf         # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.

 

I have confirmed that I can run my job with all of these settings via spark-shell on the same server:

spark-shell --master yarn-client --queue root.heds.dw.dev --conf spark.shuffle.service.enabled=true --conf spark.shuffle.service.port=7337 --conf spark.shuffle.consolidateFiles=true --conf spark.akka.askTimeout=60 --conf spark.akka.frameSize=500 --conf spark.executor.memory=8g --conf spark.cores.max=48 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer

 

Not entirely sure what I have configured wrong.

 

My interperter throw out a bunch of akka errors like:

ERROR [2015-06-26 13:22:26,660] ({sparkDriver-akka.actor.default-dispatcher-4} Slf4jLogger.scala[apply$mcV$sp]:66) - Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver]

java.lang.AbstractMethodError

    at akka.actor.ActorCell.create(ActorCell.scala:580)

    at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)

    at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

    at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)

    at akka.dispatch.Mailbox.run(Mailbox.scala:219)

    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)

    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

ERROR [2015-06-26 13:22:27,262] ({pool-2-thread-2} Job.java[run]:183) - Job failed

org.apache.zeppelin.interpreter.InterpreterException: java.lang.IllegalStateException: cannot create children while terminating or terminated

    at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:75)

    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)

    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:277)

    at org.apache.zeppelin.scheduler.Job.run(Job.java:170)

    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

    at java.util.concurrent.FutureTask.run(FutureTask.java:262)

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:744)

Caused by: java.lang.IllegalStateException: cannot create children while terminating or terminated

    at akka.actor.dungeon.Children$class.makeChild(Children.scala:200)

    at akka.actor.dungeon.Children$class.attachChild(Children.scala:42)

    at akka.actor.ActorCell.attachChild(ActorCell.scala:369)

    at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)

    at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:139)

    at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:179)

    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:310)

    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)

    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)

    at org.apache.spark.SparkContext.<init>(SparkContext.scala:270)

    at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:276)

    at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:149)

    at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:398)

    at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73)

    ... 12 more

 

Thanks,

 

Shaun Litt

This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.

Reply | Threaded
Open this post in threaded view
|

Re: Need help with remote YARN cluster setup

moon
Administrator
Hi,

AbstractMethodError 

So could you make clean build of Zeppelin and check maven build profile for spark and hadoop version is correct one for you?

Thanks,
moon

On Fri, Jun 26, 2015 at 6:40 AM Litt, Shaun <[hidden email]> wrote:

So I am trying to get Zeppelin running against our YARN cluster and it doesn’t seem to actually be submitting the SPARK jobs to YARN (they never show up in the UI or logs).

 

My current ENV settings are:

export MASTER=yarn-client                 # Spark master url. eg. spark://master_addr:7077. Leave empty if you want to use local mode.

export ZEPPELIN_JAVA_OPTS="-Dspark.shuffle.service.enabled=true -Dspark.shuffle.service.port=7337 -Dspark.shuffle.consolidateFiles=true -Dspark.akka.askTimeout=60 -Dspark.akka.frameSize=500 -Dspark.executor.memory=8g -Dspark.cores.max=48 -Dspark.yarn.queue=root.heds.dw.dev -Dspark.serializer=org.apache.spark.serializer.KryoSerializer"      # Additional jvm options. for example, export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=8g -Dspark.cores.max=16"

export HADOOP_CONF_DIR=/etc/hadoop/conf         # yarn-site.xml is located in configuration directory in HADOOP_CONF_DIR.

 

I have confirmed that I can run my job with all of these settings via spark-shell on the same server:

spark-shell --master yarn-client --queue root.heds.dw.dev --conf spark.shuffle.service.enabled=true --conf spark.shuffle.service.port=7337 --conf spark.shuffle.consolidateFiles=true --conf spark.akka.askTimeout=60 --conf spark.akka.frameSize=500 --conf spark.executor.memory=8g --conf spark.cores.max=48 --conf spark.serializer=org.apache.spark.serializer.KryoSerializer

 

Not entirely sure what I have configured wrong.

 

My interperter throw out a bunch of akka errors like:

ERROR [2015-06-26 13:22:26,660] ({sparkDriver-akka.actor.default-dispatcher-4} Slf4jLogger.scala[apply$mcV$sp]:66) - Uncaught fatal error from thread [sparkDriver-akka.actor.default-dispatcher-2] shutting down ActorSystem [sparkDriver]

java.lang.AbstractMethodError

    at akka.actor.ActorCell.create(ActorCell.scala:580)

    at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:456)

    at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)

    at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:263)

    at akka.dispatch.Mailbox.run(Mailbox.scala:219)

    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)

    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

    at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

    at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

    at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

 

ERROR [2015-06-26 13:22:27,262] ({pool-2-thread-2} Job.java[run]:183) - Job failed

org.apache.zeppelin.interpreter.InterpreterException: java.lang.IllegalStateException: cannot create children while terminating or terminated

    at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:75)

    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)

    at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)

    at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:277)

    at org.apache.zeppelin.scheduler.Job.run(Job.java:170)

    at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)

    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

    at java.util.concurrent.FutureTask.run(FutureTask.java:262)

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)

    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)

    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

    at java.lang.Thread.run(Thread.java:744)

Caused by: java.lang.IllegalStateException: cannot create children while terminating or terminated

    at akka.actor.dungeon.Children$class.makeChild(Children.scala:200)

    at akka.actor.dungeon.Children$class.attachChild(Children.scala:42)

    at akka.actor.ActorCell.attachChild(ActorCell.scala:369)

    at akka.actor.ActorSystemImpl.actorOf(ActorSystem.scala:552)

    at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:139)

    at org.apache.spark.storage.BlockManager.<init>(BlockManager.scala:179)

    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:310)

    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:163)

    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)

    at org.apache.spark.SparkContext.<init>(SparkContext.scala:270)

    at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:276)

    at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:149)

    at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:398)

    at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73)

    ... 12 more

 

Thanks,

 

Shaun Litt

This email and any files included with it may contain privileged,
proprietary and/or confidential information that is for the sole use
of the intended recipient(s).  Any disclosure, copying, distribution,
posting, or use of the information contained in or attached to this
email is prohibited unless permitted by the sender.  If you have
received this email in error, please immediately notify the sender
via return email, telephone, or fax and destroy this original transmission
and its included files without reading or saving it in any manner.
Thank you.