Task not serializable error when I try to cache the spark sql table

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Task not serializable error when I try to cache the spark sql table

shyla
Hello all,

I am getting org.apache.spark.SparkException: Task not serializable error when I try to cache the spark sql table. I am using a UDF on a column of table and want to cache the resultant table . I can execute the paragraph successfully when there is no caching. 

Please help! Thanks

UDF :
def fn1(res: String): Int = {
      100
    }
 spark.udf.register("fn1", fn1(_: String): Int)  
    

       spark
      .read
      .format("org.apache.spark.sql.cassandra")
      .options(Map("keyspace" -> "k", "table" -> "t"))
      .load
      .createOrReplaceTempView("t1")
      

     val df1 = spark.sql("SELECT  col1, col2, fn1(col3)   from t1" )
      
     df1.createOrReplaceTempView("t2")
     
   spark.catalog.cacheTable("t2")
Reply | Threaded
Open this post in threaded view
|

Re: Task not serializable error when I try to cache the spark sql table

Jongyoul Lee
Hi,

Which version of spark do you use?

On Thu, Jun 1, 2017 at 10:44 AM, shyla deshpande <[hidden email]> wrote:
Hello all,

I am getting org.apache.spark.SparkException: Task not serializable error when I try to cache the spark sql table. I am using a UDF on a column of table and want to cache the resultant table . I can execute the paragraph successfully when there is no caching. 

Please help! Thanks

UDF :
def fn1(res: String): Int = {
      100
    }
 spark.udf.register("fn1", fn1(_: String): Int)  
    

       spark
      .read
      .format("org.apache.spark.sql.cassandra")
      .options(Map("keyspace" -> "k", "table" -> "t"))
      .load
      .createOrReplaceTempView("t1")
      

     val df1 = spark.sql("SELECT  col1, col2, fn1(col3)   from t1" )
      
     df1.createOrReplaceTempView("t2")
     
   spark.catalog.cacheTable("t2")



--
이종열, Jongyoul Lee, 李宗烈