Returning more than the default 1000 rows

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Returning more than the default 1000 rows

Paul-Armand Verhaegen
Hi,

I have problems making zeppelin 0.7.1 (in %python or %spark.pyspark) to return more than the default 1000 rows (from a pandas dataframe) in a visualisation or csv download.
I tried to increase the values of all maxResults settings in interpreter.json, but to no avail (and restarted zeppelin after config change).

Can someone point me in the right direction?

Thanks,
Paul
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Returning more than the default 1000 rows

Paul-Armand Verhaegen

Thanks for your reply. Based on your suggestions I've edited conf/zeppelin-env.sh adding: 
export ZEPPELIN_SPARK_MAXRESULT=10000      # Max number of Spark SQL result to display. 1000 by default.
export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE=10240000       # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000

I've restarted zeppelin, but it still does not show any rows above 1000 (neither visualisation nor csv download).
I also doubled checked the settings by adding "env" in the common.sh to ensure that the settings are properly sourced into the shell, and they are.

Paul

On 20 Apr 2017, at 23:28, So good <[hidden email]> wrote:

The zeppelin configuration file has settings for the maximum number of rows and the maximum size of the file.

------------------ 原始邮件 ------------------
发送时间: 2017年4月21日(星期五) 3:46
收件人: "users" <[hidden email]>;
主题: Returning more than the default 1000 rows


Hi,

I have problems making zeppelin 0.7.1 (in %python or %spark.pyspark) to return more than the default 1000 rows (from a pandas dataframe) in a visualisation or csv download.
I tried to increase the values of all maxResults settings in interpreter.json, but to no avail (and restarted zeppelin after config change).

Can someone point me in the right direction?

Thanks,
Paul

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Returning more than the default 1000 rows

Paul-Armand Verhaegen

running z.__dict in a zeppelin %python paragraph shows that the max_result is not set (the configuration setting is not applied):

$ z.__dict__ 

{'javaList': <py4j.java_gateway.JavaClass object at 0x7fe187b776d0>, 'paramOption': <py4j.java_gateway.JavaClass object at 0x7fe187b77690>, 'z': JavaObject id=t, 'max_result': 2000, '_displayhook': <function displayhook at 0x7fe18291baa0>}

as a workaround, I now issue z.max_result = 2000 to increase the size of the returned csv and that works fine.

Thanks,
Paul

On 21 Apr 2017, at 13:48, Paul-Armand Verhaegen <[hidden email]> wrote:


Thanks for your reply. Based on your suggestions I've edited conf/zeppelin-env.sh adding: 
export ZEPPELIN_SPARK_MAXRESULT=10000      # Max number of Spark SQL result to display. 1000 by default.
export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE=10240000       # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000

I've restarted zeppelin, but it still does not show any rows above 1000 (neither visualisation nor csv download).
I also doubled checked the settings by adding "env" in the common.sh to ensure that the settings are properly sourced into the shell, and they are.

Paul

On 20 Apr 2017, at 23:28, So good <[hidden email]> wrote:

The zeppelin configuration file has settings for the maximum number of rows and the maximum size of the file.

------------------ 原始邮件 ------------------
发送时间: 2017年4月21日(星期五) 3:46
收件人: "users" <[hidden email]>;
主题: Returning more than the default 1000 rows


Hi,

I have problems making zeppelin 0.7.1 (in %python or %spark.pyspark) to return more than the default 1000 rows (from a pandas dataframe) in a visualisation or csv download.
I tried to increase the values of all maxResults settings in interpreter.json, but to no avail (and restarted zeppelin after config change).

Can someone point me in the right direction?

Thanks,
Paul


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Returning more than the default 1000 rows

moon
Administrator
Thanks for reporting problem and share workaround.
It looks like PythonInterpreter uses hardcoded value [1] instead of reading max_result from interpreter property.
Definitely it looks like a bug. Do you mind file an issue to project JIRA [2]?

Thanks,
moon


On Fri, Apr 21, 2017 at 5:10 AM Paul-Armand Verhaegen <[hidden email]> wrote:

running z.__dict in a zeppelin %python paragraph shows that the max_result is not set (the configuration setting is not applied):

$ z.__dict__ 

{'javaList': <py4j.java_gateway.JavaClass object at 0x7fe187b776d0>, 'paramOption': <py4j.java_gateway.JavaClass object at 0x7fe187b77690>, 'z': JavaObject id=t, 'max_result': 2000, '_displayhook': <function displayhook at 0x7fe18291baa0>}

as a workaround, I now issue z.max_result = 2000 to increase the size of the returned csv and that works fine.

Thanks,
Paul

On 21 Apr 2017, at 13:48, Paul-Armand Verhaegen <[hidden email]> wrote:


Thanks for your reply. Based on your suggestions I've edited conf/zeppelin-env.sh adding: 
export ZEPPELIN_SPARK_MAXRESULT=10000      # Max number of Spark SQL result to display. 1000 by default.
export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE=10240000       # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000

I've restarted zeppelin, but it still does not show any rows above 1000 (neither visualisation nor csv download).
I also doubled checked the settings by adding "env" in the common.sh to ensure that the settings are properly sourced into the shell, and they are.

Paul

On 20 Apr 2017, at 23:28, So good <[hidden email]> wrote:

The zeppelin configuration file has settings for the maximum number of rows and the maximum size of the file.

------------------ 原始邮件 ------------------
发件人: "Paul-Armand Verhaegen" <[hidden email]>;
发送时间: 2017年4月21日(星期五) 3:46
收件人: "users" <[hidden email]>;
主题: Returning more than the default 1000 rows


Hi,

I have problems making zeppelin 0.7.1 (in %python or %spark.pyspark) to return more than the default 1000 rows (from a pandas dataframe) in a visualisation or csv download.
I tried to increase the values of all maxResults settings in interpreter.json, but to no avail (and restarted zeppelin after config change).

Can someone point me in the right direction?

Thanks,
Paul


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Returning more than the default 1000 rows

Paul-Armand Verhaegen

I've filed ZEPPELIN-2447 for this bug.
I'll see if I can create PR in github too.

Thanks for the follow-up,
Paul 

On 22 Apr 2017, at 07:37, moon soo Lee <[hidden email]> wrote:

Thanks for reporting problem and share workaround.
It looks like PythonInterpreter uses hardcoded value [1] instead of reading max_result from interpreter property.
Definitely it looks like a bug. Do you mind file an issue to project JIRA [2]?

Thanks,
moon


On Fri, Apr 21, 2017 at 5:10 AM Paul-Armand Verhaegen <[hidden email]> wrote:

running z.__dict in a zeppelin %python paragraph shows that the max_result is not set (the configuration setting is not applied):

$ z.__dict__ 

{'javaList': <py4j.java_gateway.JavaClass object at 0x7fe187b776d0>, 'paramOption': <py4j.java_gateway.JavaClass object at 0x7fe187b77690>, 'z': JavaObject id=t, 'max_result': 2000, '_displayhook': <function displayhook at 0x7fe18291baa0>}

as a workaround, I now issue z.max_result = 2000 to increase the size of the returned csv and that works fine.

Thanks,
Paul

On 21 Apr 2017, at 13:48, Paul-Armand Verhaegen <[hidden email]> wrote:


Thanks for your reply. Based on your suggestions I've edited conf/zeppelin-env.sh adding: 
export ZEPPELIN_SPARK_MAXRESULT=10000      # Max number of Spark SQL result to display. 1000 by default.
export ZEPPELIN_WEBSOCKET_MAX_TEXT_MESSAGE_SIZE=10240000       # Size in characters of the maximum text message to be received by websocket. Defaults to 1024000

I've restarted zeppelin, but it still does not show any rows above 1000 (neither visualisation nor csv download).
I also doubled checked the settings by adding "env" in the common.sh to ensure that the settings are properly sourced into the shell, and they are.

Paul

On 20 Apr 2017, at 23:28, So good <[hidden email]> wrote:

The zeppelin configuration file has settings for the maximum number of rows and the maximum size of the file.

------------------ 原始邮件 ------------------
发件人: "Paul-Armand Verhaegen" <[hidden email]>;
发送时间: 2017年4月21日(星期五) 3:46
收件人: "users" <[hidden email]>;
主题: Returning more than the default 1000 rows


Hi,

I have problems making zeppelin 0.7.1 (in %python or %spark.pyspark) to return more than the default 1000 rows (from a pandas dataframe) in a visualisation or csv download.
I tried to increase the values of all maxResults settings in interpreter.json, but to no avail (and restarted zeppelin after config change).

Can someone point me in the right direction?

Thanks,
Paul



Loading...