Scratching my head around Zeppelin/Spark & Docker

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Scratching my head around Zeppelin/Spark & Docker

DuyHai Doan
Hello guys

Has anyone attempted to run Zeppelin inside Docker that connect to a real Spark cluster running on the host machine ?

I've spend a day trying to make it work but unsuccessfully, the job never completed because the driver program (Zeppelin Spark shell) is listening on an Internal IP address (the internal IP address of the container).

I've tried to run the Docker container with host network (--net=host) but in this case, I cannot access to Zeppelin through localhost:8080 or 127.0.0.1:8080

Anyone has an idea to unblock my use-case ?
Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

George Webster
can you share your docker file? also, are you using docker-compose or just docker-machine

On Fri, Aug 5, 2016 at 8:46 PM, DuyHai Doan <[hidden email]> wrote:
Hello guys

Has anyone attempted to run Zeppelin inside Docker that connect to a real Spark cluster running on the host machine ?

I've spend a day trying to make it work but unsuccessfully, the job never completed because the driver program (Zeppelin Spark shell) is listening on an Internal IP address (the internal IP address of the container).

I've tried to run the Docker container with host network (--net=host) but in this case, I cannot access to Zeppelin through localhost:8080 or 127.0.0.1:8080

Anyone has an idea to unblock my use-case ?

Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

DuyHai Doan
No, using Docker compose is easy, what I want is:

1) Zeppelin running inside a Docker container
2) Spark deployed in stand-alone mode, running somewhere on bare-metal / cloud / Docker but in another network

In this scenario, it's very hard to make the Zeppelin client that is living inside the Docker container to communicate with the external Spark cluster

On Fri, Aug 5, 2016 at 9:00 PM, George Webster <[hidden email]> wrote:
can you share your docker file? also, are you using docker-compose or just docker-machine

On Fri, Aug 5, 2016 at 8:46 PM, DuyHai Doan <[hidden email]> wrote:
Hello guys

Has anyone attempted to run Zeppelin inside Docker that connect to a real Spark cluster running on the host machine ?

I've spend a day trying to make it work but unsuccessfully, the job never completed because the driver program (Zeppelin Spark shell) is listening on an Internal IP address (the internal IP address of the container).

I've tried to run the Docker container with host network (--net=host) but in this case, I cannot access to Zeppelin through localhost:8080 or 127.0.0.1:8080

Anyone has an idea to unblock my use-case ?


Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

Vinay Shukla
Just curious, what is the use case to run Z inside docker?

On Friday, August 5, 2016, DuyHai Doan <[hidden email]> wrote:
No, using Docker compose is easy, what I want is:

1) Zeppelin running inside a Docker container
2) Spark deployed in stand-alone mode, running somewhere on bare-metal / cloud / Docker but in another network

In this scenario, it's very hard to make the Zeppelin client that is living inside the Docker container to communicate with the external Spark cluster

On Fri, Aug 5, 2016 at 9:00 PM, George Webster <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;webstergd@gmail.com&#39;);" target="_blank">webstergd@...> wrote:
can you share your docker file? also, are you using docker-compose or just docker-machine

On Fri, Aug 5, 2016 at 8:46 PM, DuyHai Doan <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;doanduyhai@gmail.com&#39;);" target="_blank">doanduyhai@...> wrote:
Hello guys

Has anyone attempted to run Zeppelin inside Docker that connect to a real Spark cluster running on the host machine ?

I've spend a day trying to make it work but unsuccessfully, the job never completed because the driver program (Zeppelin Spark shell) is listening on an Internal IP address (the internal IP address of the container).

I've tried to run the Docker container with host network (--net=host) but in this case, I cannot access to Zeppelin through localhost:8080 or 127.0.0.1:8080

Anyone has an idea to unblock my use-case ?


Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

Luciano Resende
In reply to this post by DuyHai Doan
Not exactly what you want, but I have an example here :
https://github.com/lresende/docker-systemml-notebook

You should be able to accomplish what you want playing with --link which I did in the example below (but just with Yarn and HDFS)
https://github.com/lresende/docker-yarn-cluster

On Fri, Aug 5, 2016 at 11:07 PM, DuyHai Doan <[hidden email]> wrote:
No, using Docker compose is easy, what I want is:

1) Zeppelin running inside a Docker container
2) Spark deployed in stand-alone mode, running somewhere on bare-metal / cloud / Docker but in another network

In this scenario, it's very hard to make the Zeppelin client that is living inside the Docker container to communicate with the external Spark cluster

On Fri, Aug 5, 2016 at 9:00 PM, George Webster <[hidden email]> wrote:
can you share your docker file? also, are you using docker-compose or just docker-machine

On Fri, Aug 5, 2016 at 8:46 PM, DuyHai Doan <[hidden email]> wrote:
Hello guys

Has anyone attempted to run Zeppelin inside Docker that connect to a real Spark cluster running on the host machine ?

I've spend a day trying to make it work but unsuccessfully, the job never completed because the driver program (Zeppelin Spark shell) is listening on an Internal IP address (the internal IP address of the container).

I've tried to run the Docker container with host network (--net=host) but in this case, I cannot access to Zeppelin through localhost:8080 or 127.0.0.1:8080

Anyone has an idea to unblock my use-case ?





--
Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

Jim Lola
In reply to this post by DuyHai Doan
I heard that supposedly the IBM guys were able to do this earlier this year.

Unfortunately these folks may have been part of the continued layoffs at IBM.

On Fri, Aug 5, 2016 at 11:46 AM, DuyHai Doan <[hidden email]> wrote:
Hello guys

Has anyone attempted to run Zeppelin inside Docker that connect to a real Spark cluster running on the host machine ?

I've spend a day trying to make it work but unsuccessfully, the job never completed because the driver program (Zeppelin Spark shell) is listening on an Internal IP address (the internal IP address of the container).

I've tried to run the Docker container with host network (--net=host) but in this case, I cannot access to Zeppelin through localhost:8080 or 127.0.0.1:8080

Anyone has an idea to unblock my use-case ?

Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

Luciano Resende
In reply to this post by Luciano Resende


On Fri, Aug 5, 2016 at 11:18 PM, Luciano Resende <[hidden email]> wrote:
Not exactly what you want, but I have an example here :
https://github.com/lresende/docker-systemml-notebook

You should be able to accomplish what you want playing with --link which I did in the example below (but just with Yarn and HDFS)
https://github.com/lresende/docker-yarn-cluster


BTW, you might have to use Levy to access the remote Spark.

--
Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

Trevor Grant
Was having some issues with this too.

Did you try using the %sh to ping the machine with spark?  e.g. check for networking issue.

Trevor Grant
Data Scientist

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Fri, Aug 5, 2016 at 3:29 PM, Luciano Resende <[hidden email]> wrote:


On Fri, Aug 5, 2016 at 11:18 PM, Luciano Resende <[hidden email]> wrote:
Not exactly what you want, but I have an example here :
https://github.com/lresende/docker-systemml-notebook

You should be able to accomplish what you want playing with --link which I did in the example below (but just with Yarn and HDFS)
https://github.com/lresende/docker-yarn-cluster


BTW, you might have to use Levy to access the remote Spark.

--

Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

DuyHai Doan
My host IP is 192.168.1.16
The VM IP is 10.0.2.15 (using Docker4Mac)

Indeed I tried many things:

1) using the host network (--net=host) but then I cannot access Zeppelin (localhost:8080 or 127.0.0.1:8080). Zeppelin listens to the address 0.0.0.0 by default (config in zeppelin-site.xml). Changing it to 127.0.0.1 does not help

2) using bridge network but configure port
    - run the container with -p 8080:8080 -p 8081:8081 -p 4040:4040
      this time I can access Zeppelin Web UI but then the Spark job hangs

    - edit the $ZEPPELIN_HOME/bin/interpreter.sh  to start Zeppelin in "cluster" mode (--deploy-mode cluster) but it does not work 

    - force the driver host to 192.168.1.16 and driver port to a fix port (9991) instead of a random port --> the job is launched successfully but when calling rdd.collect() the worker running on the host machine cannot send back the result to the driver program inside the container :

6/08/05 14:16:40 ERROR RetryingBlockFetcher: Exception while beginning fetch of 1 outstanding blocks java.io.IOException: Failed to connect to /10.0.2.15:42418 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)

 The driver program is still listening to the internal IP address on a dynamic port ....

On Fri, Aug 5, 2016 at 11:41 PM, Trevor Grant <[hidden email]> wrote:
Was having some issues with this too.

Did you try using the %sh to ping the machine with spark?  e.g. check for networking issue.

Trevor Grant
Data Scientist

"Fortunate is he, who is able to know the causes of things."  -Virgil


On Fri, Aug 5, 2016 at 3:29 PM, Luciano Resende <[hidden email]> wrote:


On Fri, Aug 5, 2016 at 11:18 PM, Luciano Resende <[hidden email]> wrote:
Not exactly what you want, but I have an example here :
https://github.com/lresende/docker-systemml-notebook

You should be able to accomplish what you want playing with --link which I did in the example below (but just with Yarn and HDFS)
https://github.com/lresende/docker-yarn-cluster


BTW, you might have to use Levy to access the remote Spark.

--


Reply | Threaded
Open this post in threaded view
|

Re: Scratching my head around Zeppelin/Spark & Docker

Eric Charles
Late reply...

Connecting from Zeppelin Docker image to an external Hadoop cluster
works for me.

If you want to connect to Hadoop on your host laptop, you have to hack
the IP adress.

More info on http://platform.datalayer.io/guide/latest/docker/zeppelin 
(read the "Spark in YARN mode" section)


On 06/08/16 09:05, DuyHai Doan wrote:

> My host IP is 192.168.1.16
> The VM IP is 10.0.2.15 (using Docker4Mac)
>
> Indeed I tried many things:
>
> 1) using the host network (--net=host) but then I cannot access Zeppelin
> (localhost:8080 or 127.0.0.1:8080 <http://127.0.0.1:8080>). Zeppelin
> listens to the address 0.0.0.0 by default (config in zeppelin-site.xml).
> Changing it to 127.0.0.1 does not help
>
> 2) using bridge network but configure port
>      - run the container with -p 8080:8080 -p 8081:8081 -p 4040:4040
>        this time I can access Zeppelin Web UI but then the Spark job hangs
>
>      - edit the $ZEPPELIN_HOME/bin/interpreter.sh  to start Zeppelin in
> "cluster" mode (--deploy-mode cluster) but it does not work
>
>      - force the driver host to 192.168.1.16 and driver port to a fix
> port (9991) instead of a random port --> the job is launched
> successfully but when calling rdd.collect() the worker running on the
> host machine cannot send back the result to the driver program inside
> the container :
>
> 6/08/05 14:16:40 ERROR RetryingBlockFetcher: Exception while beginning
> fetch of 1 outstanding blocks java.io.IOException: Failed to connect to
> /10.0.2.15:42418 <http://10.0.2.15:42418> at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:193)
>
>   The driver program is still listening to the internal IP address on a
> dynamic port ....
>
> On Fri, Aug 5, 2016 at 11:41 PM, Trevor Grant <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Was having some issues with this too.
>
>     Did you try using the %sh to ping the machine with spark?  e.g.
>     check for networking issue.
>
>     Trevor Grant
>     Data Scientist
>     https://github.com/rawkintrevo
>     http://stackexchange.com/users/3002022/rawkintrevo
>     <http://stackexchange.com/users/3002022/rawkintrevo>
>     http://trevorgrant.org
>
>     /"Fortunate is he, who is able to know the causes of things."  -Virgil/
>
>
>     On Fri, Aug 5, 2016 at 3:29 PM, Luciano Resende
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>
>         On Fri, Aug 5, 2016 at 11:18 PM, Luciano Resende
>         <[hidden email] <mailto:[hidden email]>> wrote:
>
>             Not exactly what you want, but I have an example here :
>             https://github.com/lresende/docker-systemml-notebook
>             <https://github.com/lresende/docker-systemml-notebook>
>
>             You should be able to accomplish what you want playing with
>             --link which I did in the example below (but just with Yarn
>             and HDFS)
>             https://github.com/lresende/docker-yarn-cluster
>             <https://github.com/lresende/docker-yarn-cluster>
>
>
>         BTW, you might have to use Levy to access the remote Spark.
>
>         --
>         Luciano Resende
>         http://twitter.com/lresende1975 <http://twitter.com/lresende1975>
>         http://lresende.blogspot.com/
>
>
>