CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Abul Basar
Hello, 

It is exciting to see new release 0.6.1 in a short span after 0.6 release. 

I am test driving 0.6.1 with spark 2.0 (Scala 2.11). RDD, DF operations are working fine. I am facing a problem while using csv package (https://github.com/databricks/spark-csv). 

i added "com.databricks:spark-csv_2.11:1.4.0" in the interpreter dependencies using UI and  I am trying the following code. I restarted zeppelin. 


val df = spark.sqlContext.read.
format("com.databricks.spark.csv").
options(Map("header" -> "true", "inferSchema" -> "true")).
load("hdfs:// ... /S&P")

df.printSchema


The above statement errors out with the follow message

java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvParserSettings.setUnescapedQuoteHandling(Lcom/univocity/parsers/csv/UnescapedQuoteHandling;)V
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser$lzycompute(CSVParser.scala:50)
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser(CSVParser.scala:35)
at org.apache.spark.sql.execution.datasources.csv.LineCsvReader.parseLine(CSVParser.scala:117)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:391)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
... 46 elided


I successfully tested the same code using REPL.The above error seems a bug introduced in 0.6.1. It works fine in 0.6.0.

Any ideas about how to resolve the issue?

Thanks!
- AB


Reply | Threaded
Open this post in threaded view
|

Re: CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Mina Lee-2
Hi Abul,

spark-csv is integrated into spark itself so you don't need to load spark-csv dependencies anymore.

Could you try below instead?

val df = sqlContext.read.
options(Map("header" -> "true", "inferSchema" -> "true")).
csv("hdfs:// ... /S&P")

df.printSchema

Hope this solves your issue!

Mina

On Wed, Aug 17, 2016 at 11:43 AM Abul Basar <[hidden email]> wrote:
Hello, 

It is exciting to see new release 0.6.1 in a short span after 0.6 release. 

I am test driving 0.6.1 with spark 2.0 (Scala 2.11). RDD, DF operations are working fine. I am facing a problem while using csv package (https://github.com/databricks/spark-csv). 

i added "com.databricks:spark-csv_2.11:1.4.0" in the interpreter dependencies using UI and  I am trying the following code. I restarted zeppelin. 


val df = spark.sqlContext.read.
format("com.databricks.spark.csv").
options(Map("header" -> "true", "inferSchema" -> "true")).
load("hdfs:// ... /S&P")

df.printSchema


The above statement errors out with the follow message

java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvParserSettings.setUnescapedQuoteHandling(Lcom/univocity/parsers/csv/UnescapedQuoteHandling;)V
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser$lzycompute(CSVParser.scala:50)
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser(CSVParser.scala:35)
at org.apache.spark.sql.execution.datasources.csv.LineCsvReader.parseLine(CSVParser.scala:117)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:391)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
... 46 elided


I successfully tested the same code using REPL.The above error seems a bug introduced in 0.6.1. It works fine in 0.6.0.

Any ideas about how to resolve the issue?

Thanks!
- AB


Reply | Threaded
Open this post in threaded view
|

Re: CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Vinay Shukla
Abul,

Mina is right, until Spark 1.6 csv parsing was bailable as a separate spark package. With Spark 2.0 csv parsing is built in. Zeppelin 0.6.1 ships with Spark 2.0.

Thanks,
Vinay

On Wednesday, August 17, 2016, Mina Lee <[hidden email]> wrote:
Hi Abul,

spark-csv is integrated into spark itself so you don't need to load spark-csv dependencies anymore.

Could you try below instead?

val df = sqlContext.read.
options(Map("header" -> "true", "inferSchema" -> "true")).
csv("hdfs:// ... /S&P")

df.printSchema

Hope this solves your issue!

Mina

On Wed, Aug 17, 2016 at 11:43 AM Abul Basar <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;abasar@einext.com&#39;);" target="_blank">abasar@...> wrote:
Hello, 

It is exciting to see new release 0.6.1 in a short span after 0.6 release. 

I am test driving 0.6.1 with spark 2.0 (Scala 2.11). RDD, DF operations are working fine. I am facing a problem while using csv package (https://github.com/databricks/spark-csv). 

i added "com.databricks:spark-csv_2.11:1.4.0" in the interpreter dependencies using UI and  I am trying the following code. I restarted zeppelin. 


val df = spark.sqlContext.read.
format("com.databricks.spark.csv").
options(Map("header" -> "true", "inferSchema" -> "true")).
load("hdfs:// ... /S&P")

df.printSchema


The above statement errors out with the follow message

java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvParserSettings.setUnescapedQuoteHandling(Lcom/univocity/parsers/csv/UnescapedQuoteHandling;)V
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser$lzycompute(CSVParser.scala:50)
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser(CSVParser.scala:35)
at org.apache.spark.sql.execution.datasources.csv.LineCsvReader.parseLine(CSVParser.scala:117)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:391)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
... 46 elided


I successfully tested the same code using REPL.The above error seems a bug introduced in 0.6.1. It works fine in 0.6.0.

Any ideas about how to resolve the issue?

Thanks!
- AB


Reply | Threaded
Open this post in threaded view
|

Re: CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Abul Basar
Hello Vinay, Mina

Sorry about the late response. I got a chance today to verify the feature of csv reader in Spar 2.0. It worked for me. Thanks for the direction.


AB

On Wed, Aug 17, 2016 at 6:24 PM, Vinay Shukla <[hidden email]> wrote:
Abul,

Mina is right, until Spark 1.6 csv parsing was bailable as a separate spark package. With Spark 2.0 csv parsing is built in. Zeppelin 0.6.1 ships with Spark 2.0.

Thanks,
Vinay


On Wednesday, August 17, 2016, Mina Lee <[hidden email]> wrote:
Hi Abul,

spark-csv is integrated into spark itself so you don't need to load spark-csv dependencies anymore.

Could you try below instead?

val df = sqlContext.read.
options(Map("header" -> "true", "inferSchema" -> "true")).
csv("hdfs:// ... /S&P")

df.printSchema

Hope this solves your issue!

Mina

On Wed, Aug 17, 2016 at 11:43 AM Abul Basar <[hidden email]> wrote:
Hello, 

It is exciting to see new release 0.6.1 in a short span after 0.6 release. 

I am test driving 0.6.1 with spark 2.0 (Scala 2.11). RDD, DF operations are working fine. I am facing a problem while using csv package (https://github.com/databricks/spark-csv). 

i added "com.databricks:spark-csv_2.11:1.4.0" in the interpreter dependencies using UI and  I am trying the following code. I restarted zeppelin. 


val df = spark.sqlContext.read.
format("com.databricks.spark.csv").
options(Map("header" -> "true", "inferSchema" -> "true")).
load("hdfs:// ... /S&P")

df.printSchema


The above statement errors out with the follow message

java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvParserSettings.setUnescapedQuoteHandling(Lcom/univocity/parsers/csv/UnescapedQuoteHandling;)V
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser$lzycompute(CSVParser.scala:50)
at org.apache.spark.sql.execution.datasources.csv.CsvReader.parser(CSVParser.scala:35)
at org.apache.spark.sql.execution.datasources.csv.LineCsvReader.parseLine(CSVParser.scala:117)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:59)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at scala.Option.orElse(Option.scala:289)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:391)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
... 46 elided


I successfully tested the same code using REPL.The above error seems a bug introduced in 0.6.1. It works fine in 0.6.0.

Any ideas about how to resolve the issue?

Thanks!
- AB