There are limits to how much data the download to csv button will download (1.5MB? 3500 rows?) which limit zeppelin’s usefulness for our BI teams. This limit comes up far before we run into issues with showing too many rows of data in zeppelin.
Unfortunately (fortunately?) Hue is the other tool the BI team has been using and there they have no problem downloading much larger datasets to csv. This is definitely not a requirement I’ve ever run into in the way I use zeppelin since I would just use spark to write the data out. However, the BI team is not allowed to run spark jobs (they use hive via jdbc) so that download to csv button is pretty important to them.
Would it be possible to significantly increase the limit? Even better would it be possible to download more data than is shown? I assume this is the type of thing I would need to open a ticket for, but I wanted to ask here first.
Good idea to introduce in Zeppelin a way to download full datasets without
actually visualizing them.
Not sure if this helps, we taught our users to use %sh hadoop fs -getmerge /hadoop/path/dir/ /some/nfs/mount/
for large files (they sometimes have to download datasets with millions of records).
They run Zeppelin on edge nodes that have NFS mounts to a drop zone.
ps. Hue has a limit too, by default 100k rows
Not sure how much it scales up.
On Tue, May 2, 2017 at 10:41 AM, Paul Brenner <[hidden email]> wrote:
We came across this issue as well, Zeppelin csv export is using the data URI scheme which is base64 encoding all the rows into a single string, Chrome seems to crash with over a few thousand rows, but Firefox has been able to handle over 100k for me. However, the Zeppelin notebook itself becomes slow at that point. I would also like better support for the ability to export a large set of rows, perhaps another tool is more preferred?
On Tue, May 2, 2017 at 10:00 AM, Ruslan Dautkhanov <[hidden email]> wrote:
I think whether this is an issue or not, depends a lot on how you use Zeppelin, and what tools you need to integrate with. Sadly Excel is still around as a data processing tool, and many people who I introduce to Zeppelin are quite proficient with it, hence the desire to export to csv in a trivial manner -- or merely the presence of the "download CSV"-button incites them to expect it to work for reasonably sized data (i.e. up to around 10^6 rows).
I do prefer Ruslan's idea, but I think Zeppelin should include something similar out of the box. The key requirement should be that the data doesn't have to travel through the notebook interface, but rather is made available in a temporary folder and then served via a download link. The downside to this approach is, that ideally you'd want this kind of operation to be interpreter agnostic. In that case every interpreter would need to offer an interface which allows to collect the data to a local-to-zeppelin temporary folder.
Nonetheless, to turn Zeppelin into the serve-it-all solution that it could be, I do believe that "fixing" the csv-export is important. I'd definitely vote for a Jira advancing this issue.
On Tue, May 2, 2017 at 9:33 PM, Kevin Niemann <[hidden email]> wrote:
I’m not sure what the best solution is but I created a ticket here:
On Wed, May 03, 2017 at 4:01 AM Rick Moritz
|Free forum by Nabble||Edit this page|