UDFs in Zeppelin??

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

UDFs in Zeppelin??

Ophir Cohen
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more
Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more

Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Mina Lee
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more


Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.

On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more



Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more




Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more





Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more






Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
BTW, this isn't working as well:
val sidNameDF = hc.sql("select sid, name from hive_table limit 10")
val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
   
sidNameDF2.registerTempTable("tmp_sid_name2")

   

On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <[hidden email]> wrote:
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more







Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

moon
Administrator
Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?

Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <[hidden email]> wrote:
BTW, this isn't working as well:
val sidNameDF = hc.sql("select sid, name from hive_table limit 10")
val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
   
sidNameDF2.registerTempTable("tmp_sid_name2")

   

On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <[hidden email]> wrote:
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more







Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
Please let me know how can I help further more.

On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <[hidden email]> wrote:
Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?

Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <[hidden email]> wrote:
BTW, this isn't working as well:
val sidNameDF = hc.sql("select sid, name from hive_table limit 10")
val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
   
sidNameDF2.registerTempTable("tmp_sid_name2")

   

On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <[hidden email]> wrote:
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more








Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

IT CTO
Does this happen on a local mode as well or just on external cluster?
with regard to the repro - %sql select getNum() from filteredNc limit 1
I guess, filterdNc is some table you have? cause when I tried it on my local machine I got :
no such table filteredNc; line 1 pos 21
Eran

On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <[hidden email]> wrote:
Please let me know how can I help further more.

On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <[hidden email]> wrote:
Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?

Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <[hidden email]> wrote:
BTW, this isn't working as well:
val sidNameDF = hc.sql("select sid, name from hive_table limit 10")
val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
   
sidNameDF2.registerTempTable("tmp_sid_name2")

   

On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <[hidden email]> wrote:
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more








Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
It does not happen in local mode.
Actually whenever it works in the same process it works great.
It looks that somehow Zeppelin jar does not distributed into the nodes.
Still, it strange as register UDF and the UDF itslef does not need ZeppelinContext (at least not explicitly).

And yes, filterdNc is a local table, I just use it to enable me call the UDF. you can try that on any table.

On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <[hidden email]> wrote:
Does this happen on a local mode as well or just on external cluster?
with regard to the repro - %sql select getNum() from filteredNc limit 1
I guess, filterdNc is some table you have? cause when I tried it on my local machine I got :
no such table filteredNc; line 1 pos 21
Eran

On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <[hidden email]> wrote:
Please let me know how can I help further more.

On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <[hidden email]> wrote:
Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?

Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <[hidden email]> wrote:
BTW, this isn't working as well:
val sidNameDF = hc.sql("select sid, name from hive_table limit 10")
val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
   
sidNameDF2.registerTempTable("tmp_sid_name2")

   

On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <[hidden email]> wrote:
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more









Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

IT CTO
I think you should add these notes to the JIRA note as it is not clear from the note itself. (sorry that this is not helping solving the problem itself :-))

On Thu, Jul 2, 2015 at 2:06 PM Ophir Cohen <[hidden email]> wrote:
It does not happen in local mode.
Actually whenever it works in the same process it works great.
It looks that somehow Zeppelin jar does not distributed into the nodes.
Still, it strange as register UDF and the UDF itslef does not need ZeppelinContext (at least not explicitly).

And yes, filterdNc is a local table, I just use it to enable me call the UDF. you can try that on any table.

On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <[hidden email]> wrote:
Does this happen on a local mode as well or just on external cluster?
with regard to the repro - %sql select getNum() from filteredNc limit 1
I guess, filterdNc is some table you have? cause when I tried it on my local machine I got :
no such table filteredNc; line 1 pos 21
Eran

On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <[hidden email]> wrote:
Please let me know how can I help further more.

On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <[hidden email]> wrote:
Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?

Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <[hidden email]> wrote:
BTW, this isn't working as well:
val sidNameDF = hc.sql("select sid, name from hive_table limit 10")
val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
   
sidNameDF2.registerTempTable("tmp_sid_name2")

   

On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <[hidden email]> wrote:
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more









Reply | Threaded
Open this post in threaded view
|

Re: UDFs in Zeppelin??

Ophir Cohen
Will do so soon.
10x

On Thu, Jul 2, 2015 at 2:39 PM, IT CTO <[hidden email]> wrote:
I think you should add these notes to the JIRA note as it is not clear from the note itself. (sorry that this is not helping solving the problem itself :-))

On Thu, Jul 2, 2015 at 2:06 PM Ophir Cohen <[hidden email]> wrote:
It does not happen in local mode.
Actually whenever it works in the same process it works great.
It looks that somehow Zeppelin jar does not distributed into the nodes.
Still, it strange as register UDF and the UDF itslef does not need ZeppelinContext (at least not explicitly).

And yes, filterdNc is a local table, I just use it to enable me call the UDF. you can try that on any table.

On Thu, Jul 2, 2015 at 1:23 PM, IT CTO <[hidden email]> wrote:
Does this happen on a local mode as well or just on external cluster?
with regard to the repro - %sql select getNum() from filteredNc limit 1
I guess, filterdNc is some table you have? cause when I tried it on my local machine I got :
no such table filteredNc; line 1 pos 21
Eran

On Thu, Jul 2, 2015 at 12:44 PM Ophir Cohen <[hidden email]> wrote:
Please let me know how can I help further more.

On Thu, Jul 2, 2015 at 2:35 AM, moon soo Lee <[hidden email]> wrote:
Really appreciate for sharing the problem.
Very interesting. Do you mind file a issue on JIRA?

Best,
moon

On Tue, Jun 30, 2015 at 4:32 AM Ophir Cohen <[hidden email]> wrote:
BTW, this isn't working as well:
val sidNameDF = hc.sql("select sid, name from hive_table limit 10")
val sidNameDF2 = hc.createDataFrame(sidNameDF.rdd, sidNameDF.schema)
   
sidNameDF2.registerTempTable("tmp_sid_name2")

   

On Tue, Jun 30, 2015 at 1:45 PM, Ophir Cohen <[hidden email]> wrote:
I've made some progress in this issue and I think it's a bug...

Apparently, when trying to use registered UDFs on tables that comes from Hive - it returns the above exception (ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext).
When create new table and register it - UDFs works as expected.
You can see below to full details and example.

Can someone tell if it's the expected behavior or a bug?
BTW
I don't mind to work on that bug - if you can give a pointer to the right places.

BTW2
Trying to register the SAME DataFrame as tempTable does not solve the problem - only creating new table out of new DataFrame (see below).

Detailed example
1. I have table in Hive called 'hive_table' with string field called 'name' and int filed called 'sid'

2. I registered a udf:
def getStr(str: String) = str + "_str"
hc.udf.register("getStr", getStr _)

3. Running the following on Zeppelin:
%sql select getStr(name), * from hive_table
yields with excpetion:
ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext

4. Creating new table, as follows:
case class SidName(sid: Int, name: String)
val sidNameList = hc.sql("select sid, name from hive_table limit 10").collectAsList().map(row => new SidName(row.getInt(0), row.getString(1)))
val sidNameDF = hc.createDataFrame(sidNameList)
sidNameDF.registerTempTable("tmp_sid_name")

5. Query the new table in the same fashion:
%sql select getStr(name), * from tmp_sid_name

This time I get the expected results!


On Mon, Jun 29, 2015 at 5:16 PM, Ophir Cohen <[hidden email]> wrote:
BTW
The same query, on the same cluster but on Spark shell return the expected results.

On Mon, Jun 29, 2015 at 3:24 PM, Ophir Cohen <[hidden email]> wrote:
It looks that Zeppelin jar does not distributed to Spark nodes, though I can't understand why it needed for the UDF.

On Mon, Jun 29, 2015 at 3:23 PM, Ophir Cohen <[hidden email]> wrote:
Thanks for the response,
I'm not sure what do you mean, it exactly what I tried and failed.
As I wrote above, 'hc' is actually different name to sqlc (that is different name to z.sqlContext).

I get the same results.


On Mon, Jun 29, 2015 at 2:12 PM, Mina Lee <[hidden email]> wrote:
Hi Ophir,

Can you try below?

def getNum(): Int = {
    100
}
sqlc.udf.register("getNum", getNum _)
sqlc.sql("select getNum() from filteredNc limit 1").show

FYI sqlContext(==sqlc) is internally created by Zeppelin
and use hiveContext as sqlContext by default.
(If you did not change useHiveContext to be "false" in interpreter menu.)

Hope it helps.

On Mon, Jun 29, 2015 at 7:55 PM, Ophir Cohen <[hidden email]> wrote:
Guys?
Somebody?
Can it be that Zeppelin does not support UDFs?

On Sun, Jun 28, 2015 at 11:53 AM, Ophir Cohen <[hidden email]> wrote:
Hi Guys,
One more problem I have encountered using Zeppelin.
Using Spark 1.3.1 on Yarn Hadoop 2.4

I'm trying to create and use UDF (hc == z.sqlContext == HiveContext):
1. Create and register the UDF:
def getNum(): Int = {
    100
}

hc.udf.register("getNum",getNum _)
2. And I try to use on exist table:
%sql select getNum() from filteredNc limit 1

Or:
3. Trying using direct hc:
hc.sql("select getNum() from filteredNc limit 1").collect

Both of them yield with
"java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext"
(see below the full exception).

And my questions is:
1. Can it be that ZeppelinContext is not available on Spark nodes?
2. Why it need ZeppelinContext anyway? Why it's relevant?

The exception:
 WARN [2015-06-28 08:43:53,850] ({task-result-getter-0} Logging.scala[logWarning]:71) - Lost task 0.2 in stage 23.0 (TID 1626, ip-10-216-204-246.ec2.internal): java.lang.NoClassDefFoundError: Lorg/apache/zeppelin/spark/ZeppelinContext;
    at java.lang.Class.getDeclaredFields0(Native Method)
    at java.lang.Class.privateGetDeclaredFields(Class.java:2499)
    at java.lang.Class.getDeclaredField(Class.java:1951)
    at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1659)

<Many more of ObjectStreamClass lines of exception>

Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:69)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 103 more
Caused by: java.lang.ClassNotFoundException: org.apache.zeppelin.spark.ZeppelinContext
    at java.lang.ClassLoader.findClass(ClassLoader.java:531)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.scala:26)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:34)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.scala:30)
    at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:64)
    ... 105 more