Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Sofiane Cherchalli
Hi,

I have a standalone cluster, one master and one worker, running in separate nodes. Zeppelin is running is in a separate node too in client mode.

When I run a notebook that reads a CSV file located in the worker node with Spark-CSV package, Zeppelin tries to read the CSV locally and fails because the CVS is in the worker node and not in Zeppelin node.

Is this the expected behavior? 

Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Jongyoul Lee
Could you test if it works with spark-shell?

On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <[hidden email]> wrote:
Hi,

I have a standalone cluster, one master and one worker, running in separate nodes. Zeppelin is running is in a separate node too in client mode.

When I run a notebook that reads a CSV file located in the worker node with Spark-CSV package, Zeppelin tries to read the CSV locally and fails because the CVS is in the worker node and not in Zeppelin node.

Is this the expected behavior? 

Thanks.



--
이종열, Jongyoul Lee, 李宗烈
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Sofiane Cherchalli
Yes, I already tested with spark-shell and pyspark , with the same result.

Can't I use Linux filesystem to read CSV, such as file:///data/file.csv. My understanding is that the job is sent and is interpreted in the worker, isn't it?

Thanks.

El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <[hidden email]> escribió:
Could you test if it works with spark-shell?

On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <[hidden email]> wrote:
Hi,

I have a standalone cluster, one master and one worker, running in separate nodes. Zeppelin is running is in a separate node too in client mode.

When I run a notebook that reads a CSV file located in the worker node with Spark-CSV package, Zeppelin tries to read the CSV locally and fails because the CVS is in the worker node and not in Zeppelin node.

Is this the expected behavior? 

Thanks.



--
이종열, Jongyoul Lee, 李宗烈
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Meethu Mathew
Try putting the csv in the same path in all the nodes or in a mount point path which is accessible by all the nodes

Regards, 
Meethu Mathew


On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli <[hidden email]> wrote:
Yes, I already tested with spark-shell and pyspark , with the same result.

Can't I use Linux filesystem to read CSV, such as file:///data/file.csv. My understanding is that the job is sent and is interpreted in the worker, isn't it?

Thanks.

El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <[hidden email]> escribió:
Could you test if it works with spark-shell?

On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <[hidden email]> wrote:
Hi,

I have a standalone cluster, one master and one worker, running in separate nodes. Zeppelin is running is in a separate node too in client mode.

When I run a notebook that reads a CSV file located in the worker node with Spark-CSV package, Zeppelin tries to read the CSV locally and fails because the CVS is in the worker node and not in Zeppelin node.

Is this the expected behavior? 

Thanks.



--
이종열, Jongyoul Lee, 李宗烈

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Sofiane Cherchalli
I've put the csv in the worker node since the job is run in the worker. I didn't put the csv in the master because I believe it doesn't run jobs.

If I put the csv in the zeppelin node with the same path as the worker, it reads the csv and writes a _SUCCESS file locally. The job is run on the worker too but doesn't terminate. The result is saved under a _temporary directory in the worker.

worker - ls -laRt /data/02.csv/                                                                                                                                                     
02.csv/:
total 0
drwxr-xr-x. 3 root root 24 Apr 28 09:55 .
drwxr-xr-x. 3 root root 15 Apr 28 09:55 _temporary
drwxr-xr-x. 3 root root 64 Apr 28 09:55 ..

02.csv/_temporary:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 0
drwxr-xr-x. 3 root root  15 Apr 28 09:55 .
drwxr-xr-x. 3 root root  24 Apr 28 09:55 ..

02.csv/_temporary/0:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 .
drwxr-xr-x. 2 root root   6 Apr 28 09:56 _temporary
drwxr-xr-x. 2 root root 129 Apr 28 09:56 task_20170428095632_0005_m_000000
drwxr-xr-x. 2 root root 129 Apr 28 09:55 task_20170428095516_0002_m_000000
drwxr-xr-x. 3 root root  15 Apr 28 09:55 ..

02.csv/_temporary/0/_temporary:
total 0
drwxr-xr-x. 2 root root   6 Apr 28 09:56 .
drwxr-xr-x. 5 root root 106 Apr 28 09:56 ..

02.csv/_temporary/0/task_20170428095632_0005_m_000000:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:56 .part-00000-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:56 part-00000-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv
drwxr-xr-x. 2 root root   129 Apr 28 09:56 .

02.csv/_temporary/0/task_20170428095516_0002_m_000000:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:55 .part-00000-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:55 part-00000-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv


zeppelin - ls -laRt 02.csv/                                                                                                                                                                                        
02.csv/:
total 12
drwxr-sr-x    2 root     <a dir="ltr" href="tel:10000700">10000700      4096 Apr 28 09:56 .
-rw-r--r--    1 root     <a dir="ltr" href="tel:10000700">10000700         8 Apr 28 09:56 ._SUCCESS.crc
-rw-r--r--    1 root     <a dir="ltr" href="tel:10000700">10000700         0 Apr 28 09:56 _SUCCESS
drwxrwsr-x    5 root     <a dir="ltr" href="tel:10000700">10000700      4096 Apr 28 09:56 ..




El El mié, 10 may 2017 a las 14:06, Meethu Mathew <[hidden email]> escribió:
Try putting the csv in the same path in all the nodes or in a mount point path which is accessible by all the nodes

Regards, 


Meethu Mathew


On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli <[hidden email]> wrote:
Yes, I already tested with spark-shell and pyspark , with the same result.

Can't I use Linux filesystem to read CSV, such as file:///data/file.csv. My understanding is that the job is sent and is interpreted in the worker, isn't it?

Thanks.

El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <[hidden email]> escribió:
Could you test if it works with spark-shell?

On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <[hidden email]> wrote:
Hi,

I have a standalone cluster, one master and one worker, running in separate nodes. Zeppelin is running is in a separate node too in client mode.

When I run a notebook that reads a CSV file located in the worker node with Spark-CSV package, Zeppelin tries to read the CSV locally and fails because the CVS is in the worker node and not in Zeppelin node.

Is this the expected behavior? 

Thanks.



--
이종열, Jongyoul Lee, 李宗烈

Loading...