Unable to set aggregate user defined case classes

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Unable to set aggregate user defined case classes

Anqi Chen

I’m observing a weird behavior in zeppelin %spark. I have the following in my paragraph:

case class Test(a: Int, b: Int)
val a = Test(1,2)
val b = Test(1,2)
val c = Test(2,3)
val l = List(a,b,c)
val rdd = spark.sparkContext.parallelize(l)
rdd.map(v => (1, v)).aggregateByKey(scala.collection.mutable.HashSet.empty[Test])((result, item) => result + item, (result1, result2) => result1 ++ result2).collect()

I would expect the result to be Array((1, Set(Test(1,2), Test(2,3)))), however, I’m actually seeing Array(1, Set(Test(1,2), Test(1,2), Test(2,3)))). Why is spark unable to set aggregate on my case class? Is this a zeppelin issue or spark problem?

I have confirmed that doing Set(a,b,c) in scala REPL returns back Set(Test(1,2), Test(2,3)), as expected.