%sql counts from multiple columns by checkbox

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

%sql counts from multiple columns by checkbox

Jonathan Davidson
Hello all, first time poster! I tried looking in forums and documentation but nothing seems to suit the specific nature of my question.

I am trying to display counts based on checkbox functionality in Zeppelin.The base table can be pretty simple:
%spark
case class Data(a: Int, b: String)
val df = Seq(
  Data(1, "one"),
  Data(2, "two"),
  Data(3,null)).toDF()
df.collect()
df.registerTempTable("table")

I want counts in SQL:
%sql
SELECT COUNT(a),COUNT(b) FROM table
--gives 3 and 2, respectively

I would think to use 
SELECT COUNT(${checkbox:test=a,a|b}) FROM table
However, the problem is that it doesn't scale to multiple checked values at the same time. If you use more than one, it comes out as the equivalent of 
SELECT COUNT(a,b) FROM ds2
which returns one value of 2; I want two values 3 and 2 as stated above

Basically, I've tried turning the values into COUNT(a) and COUNT(b) but it gets confused:
SELECT ${checkbox(whatever):test=COUNT(a)|COUNT(b),COUNT(a)|COUNT(b)} FROM ds2
cannot recognize input near 'FROM' 'ds2' '<EOF>' in select clause; line 1 pos 8

I've also tried simple manipulation like adding stuff in parenthesis after ${checkbox...
SELECT ${checkbox(whatever):test=a|b,a|b} FROM ds2
cannot resolve 'awhateverb' given input columns: [a, b]; line 1 pos 7

Does anyone have advice?

Thanks,
Jonathan
Reply | Threaded
Open this post in threaded view
|

Re: %sql counts from multiple columns by checkbox

Jonathan Davidson
For anyone facing the same issue, I've sort of solved my issue by trying to convert the entire statement into a string. Seems like it should be unnecessary, but feel free to offer a better method!

Inline image 1

Code:
%spark
case class Data(a: Int, b: String)
val ds2 = Seq(
  Data(1, "one"),
  Data(2, "two"),
  Data(3,null)).toDF()

ds2.collect()
ds2.registerTempTable("ds2")

%spark
val options = Seq(("a","a"),("b","b"))
val selectstatement = "SELECT COUNT("+z.checkbox("columns",options).mkString("), COUNT(")+") FROM ds2"
println(selectstatement)

%spark
ds2.sqlContext.sql(selectstatement).show

Refinements will be possible now that it works.


On Tue, Jun 6, 2017 at 4:06 PM, Jonathan Davidson <[hidden email]> wrote:
Hello all, first time poster! I tried looking in forums and documentation but nothing seems to suit the specific nature of my question.

I am trying to display counts based on checkbox functionality in Zeppelin.The base table can be pretty simple:
%spark
case class Data(a: Int, b: String)
val df = Seq(
  Data(1, "one"),
  Data(2, "two"),
  Data(3,null)).toDF()
df.collect()
df.registerTempTable("table")

I want counts in SQL:
%sql
SELECT COUNT(a),COUNT(b) FROM table
--gives 3 and 2, respectively

I would think to use 
SELECT COUNT(${checkbox:test=a,a|b}) FROM table
However, the problem is that it doesn't scale to multiple checked values at the same time. If you use more than one, it comes out as the equivalent of 
SELECT COUNT(a,b) FROM ds2
which returns one value of 2; I want two values 3 and 2 as stated above

Basically, I've tried turning the values into COUNT(a) and COUNT(b) but it gets confused:
SELECT ${checkbox(whatever):test=COUNT(a)|COUNT(b),COUNT(a)|COUNT(b)} FROM ds2
cannot recognize input near 'FROM' 'ds2' '<EOF>' in select clause; line 1 pos 8

I've also tried simple manipulation like adding stuff in parenthesis after ${checkbox...
SELECT ${checkbox(whatever):test=a|b,a|b} FROM ds2
cannot resolve 'awhateverb' given input columns: [a, b]; line 1 pos 7

Does anyone have advice?

Thanks,
Jonathan