I hope this is a simple one and you can help me 😊
I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS EMR (emr-5.4.0). I notice that the %dep interpreter is not available on AWS EMR so I can’t use that option.
I follow these instructions to add the dependency: https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html
I want to add the databricks spark-xml package for importing xml files to dataframes: https://github.com/databricks/spark-xml
This is the groupId:artifactId:version:
In Zeppelin, when I go to edit spark interpreter,
*I enter com.databricks:spark-xml_2.11:0.4.1 to the artifact field
*and then when I click OK to this dialog “Do you want to update this interpreter and restart with new settings – cancel | OK” click OK does nothing, the dialog stays on screen.
I assume this is writing dependency to spark group in the interpreter.json, is that correct? I tried altering write permissions for that file but didn’t help.
I confirm this is correct for my Spark/Scala version by running spark-shell, and since this works I assume I don’t need to add any additional maven repo.
Maybe I do need new repo?
Maybe I need to put the jar in my local repo? Interpreter.json says my local repo is /var/lib/zeppelin/.m2/repository but this directory does not exist.
I can use this package from spark shell successfully:
$spark-shell --packages com.databricks:spark-xml_2.11:0.4.1
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
Thanks for reporting the problem.
Downloaded dependency will be stored under 'local-repo' directory (by default). For example after i add com.databricks:spark-xml_2.11:0.4.1 in spark interpreter setting,
moon$ ls local-repo/2CD5YP3GK/
I see two files downloaded under ZEPPELIN_HOME/local-repo/[INTERPRETER_ID] directory.
Hope this helps
On Thu, Apr 13, 2017 at 10:42 AM David Howell <[hidden email]> wrote:
|Free forum by Nabble||Edit this page|