Dependency management

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Dependency management

David Howell

Hi users,

I hope this is a simple one and you can help me 😊

I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS EMR (emr-5.4.0). I notice that the %dep interpreter is not available on AWS EMR so I can’t use that option.

 

I follow these instructions to add the dependency: https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html

 

I want to add the databricks spark-xml package for importing xml files to dataframes:  https://github.com/databricks/spark-xml

 

This is the groupId:artifactId:version:

com.databricks:spark-xml_2.11:0.4.1

 

In Zeppelin, when I go to edit spark interpreter,

*I enter  com.databricks:spark-xml_2.11:0.4.1 to the artifact field

*click save

*and then when I click OK to this dialog “Do you want to update this interpreter and restart with new settings – cancel | OK” click OK does nothing, the dialog stays on screen.

 

I assume this is writing dependency to spark group in the interpreter.json, is that correct? I tried altering write permissions for that file but didn’t help.

 

I confirm this is correct for my Spark/Scala version by running spark-shell, and since this works I assume I don’t need to add any additional maven repo.

Maybe I do need new repo?

Maybe I need to put the jar in my local repo? Interpreter.json says my local repo is /var/lib/zeppelin/.m2/repository but this directory does not exist.

 

 

I can use this package from spark shell successfully:

 

$spark-shell --packages com.databricks:spark-xml_2.11:0.4.1

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)

val df = sqlContext.read

                    .format("com.databricks.spark.xml")

 

 

 

zipMoney Logo

 

 

David Howell

Data Engineering


+61 477 150 379

 

Facebook link

Twitter link

Instagram link

Linkedin link

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Dependency management

moon
Administrator
Hi,

Thanks for reporting the problem.

Downloaded dependency will be stored under 'local-repo' directory (by default). For example after i add com.databricks:spark-xml_2.11:0.4.1 in spark interpreter setting, 

moon$ ls local-repo/2CD5YP3GK/
scala-library-2.11.7.jar spark-xml_2.11-0.4.1.jar

I see two files downloaded under ZEPPELIN_HOME/local-repo/[INTERPRETER_ID] directory.

Hope this helps

Thanks,
moon

On Thu, Apr 13, 2017 at 10:42 AM David Howell <[hidden email]> wrote:

Hi users,

I hope this is a simple one and you can help me 😊

I am having trouble adding dependency to Zeppelin Notebook (0.7.0) on AWS EMR (emr-5.4.0). I notice that the %dep interpreter is not available on AWS EMR so I can’t use that option.

 

I follow these instructions to add the dependency: https://zeppelin.apache.org/docs/latest/manual/dependencymanagement.html

 

I want to add the databricks spark-xml package for importing xml files to dataframes:  https://github.com/databricks/spark-xml

 

This is the groupId:artifactId:version:

com.databricks:spark-xml_2.11:0.4.1

 

In Zeppelin, when I go to edit spark interpreter,

*I enter  com.databricks:spark-xml_2.11:0.4.1 to the artifact field

*click save

*and then when I click OK to this dialog “Do you want to update this interpreter and restart with new settings – cancel | OK” click OK does nothing, the dialog stays on screen.

 

I assume this is writing dependency to spark group in the interpreter.json, is that correct? I tried altering write permissions for that file but didn’t help.

 

I confirm this is correct for my Spark/Scala version by running spark-shell, and since this works I assume I don’t need to add any additional maven repo.

Maybe I do need new repo?

Maybe I need to put the jar in my local repo? Interpreter.json says my local repo is /var/lib/zeppelin/.m2/repository but this directory does not exist.

 

 

I can use this package from spark shell successfully:

 

$spark-shell --packages com.databricks:spark-xml_2.11:0.4.1

import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)

val df = sqlContext.read

                    .format("com.databricks.spark.xml")

 

 

 

image002.png

 

 

David Howell

Data Engineering


<a href="tel:+61%20477%20150%20379" value="+61477150379" target="_blank">+61 477 150 379

 

image004.png

image006.png

image008.png

image010.png