[DISCUSS] Release package size

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Release package size

Mina Lee-2
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Jeff Zhang

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Jun Kim
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Prabhjyot Singh-2
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Ahyoung Ryu-3
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Jeff Zhang

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Jun Kim
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Jeff Zhang

Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

moon
Administrator
Hi,

+1 for releasing netinst package only.

Regarding make binary package only some packages, like spark, markdown, jdbc, we have discussed having minimal package in [1].
And i still think it's very difficult to decide which interpreter need to be included which is not. For example i prefer to have 'sh' and 'python' be included too and some people might have other opinions. And it's difficult to say why some interpreters included but the other interpreters can not be included in binary release, unless we have some policy that everyone agree.

Regarding 3rd party interpreter, 
Nothing stops build interpreter in a separate project. Zeppelin's interpreter installation script [2] supports 3rd party interpreter and Zeppelin already capable of loading 3rd party interpreter binary. However, i haven't seen many people using this feature. I also have some idea how we can encourage making 3rd party interpreter. Let's open separate thread and discuss there.

Thanks,
moon



On Tue, Jan 17, 2017 at 8:05 PM Jeff Zhang <[hidden email]> wrote:

Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Mohit Jaggi
In reply to this post by Jeff Zhang
 Including ALL interpreters is not feasible, not due to download size as that is easily increased but because we wouldn't want to couple the release cycles as pointed out by Jeff. IMHO a few of the most popular ones should be included. Yes it is just one extra step but if a computer can do it why make a human suffer? :-)
Re: spark-packages, Spark does include important and mature functionality in its assembly e.g. Csv parser was merged into core spark when it matured. I believe Z should do the same.

Sent from my iPhone

On Jan 17, 2017, at 8:05 PM, Jeff Zhang <[hidden email]> wrote:


Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Eric Pugh
Can I throw out an alternate approach?   I feel like the key value of the “-all” option is to simplify the life of someone who is new to Zeppelin.    If you’re a sophisticated Zeppelin user, then picking and choosing interpreters is easy, and you you grok why you want to do that….

However, for myself, when I want to demo Zeppelin, I go straight to one of the Docker images, specifically https://github.com/dylanmei/docker-zeppelin because it bundles in everything.

Would providing a similar Docker image on the “Get Zeppelin” page that bundles in all the dependencies and interpreters solve the “how do I try Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather daunting page!   

Eric


On Jan 18, 2017, at 12:00 AM, Mohit Jaggi <[hidden email]> wrote:

 Including ALL interpreters is not feasible, not due to download size as that is easily increased but because we wouldn't want to couple the release cycles as pointed out by Jeff. IMHO a few of the most popular ones should be included. Yes it is just one extra step but if a computer can do it why make a human suffer? :-)
Re: spark-packages, Spark does include important and mature functionality in its assembly e.g. Csv parser was merged into core spark when it matured. I believe Z should do the same.

Sent from my iPhone

On Jan 17, 2017, at 8:05 PM, Jeff Zhang <[hidden email]> wrote:


Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Jongyoul Lee
I like to deploy netinst only. And it's good idea that Apache Zeppelin supports official docker image with all possible interpreters.

On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh <[hidden email]> wrote:
Can I throw out an alternate approach?   I feel like the key value of the “-all” option is to simplify the life of someone who is new to Zeppelin.    If you’re a sophisticated Zeppelin user, then picking and choosing interpreters is easy, and you you grok why you want to do that….

However, for myself, when I want to demo Zeppelin, I go straight to one of the Docker images, specifically https://github.com/dylanmei/docker-zeppelin because it bundles in everything.

Would providing a similar Docker image on the “Get Zeppelin” page that bundles in all the dependencies and interpreters solve the “how do I try Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather daunting page!   

Eric


On Jan 18, 2017, at 12:00 AM, Mohit Jaggi <[hidden email]> wrote:

 Including ALL interpreters is not feasible, not due to download size as that is easily increased but because we wouldn't want to couple the release cycles as pointed out by Jeff. IMHO a few of the most popular ones should be included. Yes it is just one extra step but if a computer can do it why make a human suffer? :-)
Re: spark-packages, Spark does include important and mature functionality in its assembly e.g. Csv parser was merged into core spark when it matured. I believe Z should do the same.

Sent from my iPhone

On Jan 17, 2017, at 8:05 PM, Jeff Zhang <[hidden email]> wrote:


Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.




--
이종열, Jongyoul Lee, 李宗烈
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Mina Lee-2
Thank you for sharing your opinion guys.

I like Eric's approach.
We are planning to provide official docker managed by community.
There is ongoing work [1] around it, I can focus on this after 0.7.0 release.

It seems that majority prefers binary package with top used interpreters such as spark, md, jdbc.
I think we can gradually move to providing only netinst package once docker is ready.
For upcoming 0.7.0 release, I'd like to distribute two binary packages:
  - zeppelin-bin-min(spark, jdbc, md)
  - zeppelin-bin-netinst(spark only)


Thanks,
Mina

On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee <[hidden email]> wrote:
I like to deploy netinst only. And it's good idea that Apache Zeppelin supports official docker image with all possible interpreters.

On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh <[hidden email]> wrote:
Can I throw out an alternate approach?   I feel like the key value of the “-all” option is to simplify the life of someone who is new to Zeppelin.    If you’re a sophisticated Zeppelin user, then picking and choosing interpreters is easy, and you you grok why you want to do that….

However, for myself, when I want to demo Zeppelin, I go straight to one of the Docker images, specifically https://github.com/dylanmei/docker-zeppelin because it bundles in everything.

Would providing a similar Docker image on the “Get Zeppelin” page that bundles in all the dependencies and interpreters solve the “how do I try Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather daunting page!   

Eric


On Jan 18, 2017, at 12:00 AM, Mohit Jaggi <[hidden email]> wrote:

 Including ALL interpreters is not feasible, not due to download size as that is easily increased but because we wouldn't want to couple the release cycles as pointed out by Jeff. IMHO a few of the most popular ones should be included. Yes it is just one extra step but if a computer can do it why make a human suffer? :-)
Re: spark-packages, Spark does include important and mature functionality in its assembly e.g. Csv parser was merged into core spark when it matured. I believe Z should do the same.

Sent from my iPhone

On Jan 17, 2017, at 8:05 PM, Jeff Zhang <[hidden email]> wrote:


Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.




--
이종열, Jongyoul Lee, 李宗烈
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

moon
Administrator
Hi,

I think we need to have some policy to decide which interpreter goes into zeppelin-bin-min package. And make applying that policy as a part of release process.
Because i can not see any consistent rule except for "it seems" or "i guess". And i have no idea how i can explain if somebody ask 'why python is not in min package?' 'why xxx is not in min package?'. 

If we really want to min package, we must have a policy that gives everyone same expectation which goes to min package and which goes not. Once we agree on policy we can make it part of the release process.

So, why don't we try define policy together? Here's some idea i can throw.

 a. Min package includes interpreters, binary size less than 10MB
 b. Min package includes interpreters 5 or more JIRA issue created per month.
 c. Min package includes/exclude interpreter that community decide via formal vote.

"10MB", "5 or more" they are number i just made up. We can change them to more reasonable numbers.
Also a,b,c are possible examples. We can refine them, we can use only one, we can use all three, we can add more.

My point is, we need to give everyone the same expectation which goes min package, which goes not.
What do you think?

Thanks,
moon

On Thu, Jan 19, 2017 at 12:47 AM Mina Lee <[hidden email]> wrote:
Thank you for sharing your opinion guys.

I like Eric's approach.
We are planning to provide official docker managed by community.
There is ongoing work [1] around it, I can focus on this after 0.7.0 release.

It seems that majority prefers binary package with top used interpreters such as spark, md, jdbc.
I think we can gradually move to providing only netinst package once docker is ready.
For upcoming 0.7.0 release, I'd like to distribute two binary packages:
  - zeppelin-bin-min(spark, jdbc, md)
  - zeppelin-bin-netinst(spark only)


Thanks,
Mina

On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee <[hidden email]> wrote:
I like to deploy netinst only. And it's good idea that Apache Zeppelin supports official docker image with all possible interpreters.

On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh <[hidden email]> wrote:
Can I throw out an alternate approach?   I feel like the key value of the “-all” option is to simplify the life of someone who is new to Zeppelin.    If you’re a sophisticated Zeppelin user, then picking and choosing interpreters is easy, and you you grok why you want to do that….

However, for myself, when I want to demo Zeppelin, I go straight to one of the Docker images, specifically https://github.com/dylanmei/docker-zeppelin because it bundles in everything.

Would providing a similar Docker image on the “Get Zeppelin” page that bundles in all the dependencies and interpreters solve the “how do I try Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather daunting page!   

Eric


On Jan 18, 2017, at 12:00 AM, Mohit Jaggi <[hidden email]> wrote:

 Including ALL interpreters is not feasible, not due to download size as that is easily increased but because we wouldn't want to couple the release cycles as pointed out by Jeff. IMHO a few of the most popular ones should be included. Yes it is just one extra step but if a computer can do it why make a human suffer? :-)
Re: spark-packages, Spark does include important and mature functionality in its assembly e.g. Csv parser was merged into core spark when it matured. I believe Z should do the same.

Sent from my iPhone

On Jan 17, 2017, at 8:05 PM, Jeff Zhang <[hidden email]> wrote:


Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.




--
이종열, Jongyoul Lee, 李宗烈
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Release package size

Mina Lee-2
Decision making taking more time than I expected and
I think this shouldn't be blocker for 0.7.0.

We can take more time deciding which interpreters can be included or excluded.
Until then, I am just going to go with our current one: zeppelin-bin-all, zeppelin-bin-netinst.

Moon's suggestion looks good too.
Here I summarized interpreter lists that can be included for each option:
 a. Min package includes interpreters, binary size less than 10MB
      > angular, bigquery, hdfs, kylin, livy, md, postgresql, python, sh
 b. Min package includes interpreters 5 or more JIRA issue created per month.
      > Need to track. This can be overload for release process.
 c. Min package includes/exclude interpreter that community decide via formal vote.
     > md, jdbc, spark (based on this mailing thread)



On Fri, Jan 20, 2017 at 5:57 PM moon soo Lee <[hidden email]> wrote:
Hi,

I think we need to have some policy to decide which interpreter goes into zeppelin-bin-min package. And make applying that policy as a part of release process.
Because i can not see any consistent rule except for "it seems" or "i guess". And i have no idea how i can explain if somebody ask 'why python is not in min package?' 'why xxx is not in min package?'. 

If we really want to min package, we must have a policy that gives everyone same expectation which goes to min package and which goes not. Once we agree on policy we can make it part of the release process.

So, why don't we try define policy together? Here's some idea i can throw.

 a. Min package includes interpreters, binary size less than 10MB
 b. Min package includes interpreters 5 or more JIRA issue created per month.
 c. Min package includes/exclude interpreter that community decide via formal vote.

"10MB", "5 or more" they are number i just made up. We can change them to more reasonable numbers.
Also a,b,c are possible examples. We can refine them, we can use only one, we can use all three, we can add more.

My point is, we need to give everyone the same expectation which goes min package, which goes not.
What do you think?

Thanks,
moon

On Thu, Jan 19, 2017 at 12:47 AM Mina Lee <[hidden email]> wrote:
Thank you for sharing your opinion guys.

I like Eric's approach.
We are planning to provide official docker managed by community.
There is ongoing work [1] around it, I can focus on this after 0.7.0 release.

It seems that majority prefers binary package with top used interpreters such as spark, md, jdbc.
I think we can gradually move to providing only netinst package once docker is ready.
For upcoming 0.7.0 release, I'd like to distribute two binary packages:
  - zeppelin-bin-min(spark, jdbc, md)
  - zeppelin-bin-netinst(spark only)


Thanks,
Mina

On Thu, Jan 19, 2017 at 1:57 AM Jongyoul Lee <[hidden email]> wrote:
I like to deploy netinst only. And it's good idea that Apache Zeppelin supports official docker image with all possible interpreters.

On Wed, Jan 18, 2017 at 7:42 PM, Eric Pugh <[hidden email]> wrote:
Can I throw out an alternate approach?   I feel like the key value of the “-all” option is to simplify the life of someone who is new to Zeppelin.    If you’re a sophisticated Zeppelin user, then picking and choosing interpreters is easy, and you you grok why you want to do that….

However, for myself, when I want to demo Zeppelin, I go straight to one of the Docker images, specifically https://github.com/dylanmei/docker-zeppelin because it bundles in everything.

Would providing a similar Docker image on the “Get Zeppelin” page that bundles in all the dependencies and interpreters solve the “how do I try Zeppelin in 5 minutes” challenge?  The “Get Zeppelin” page is rather daunting page!   

Eric


On Jan 18, 2017, at 12:00 AM, Mohit Jaggi <[hidden email]> wrote:

 Including ALL interpreters is not feasible, not due to download size as that is easily increased but because we wouldn't want to couple the release cycles as pointed out by Jeff. IMHO a few of the most popular ones should be included. Yes it is just one extra step but if a computer can do it why make a human suffer? :-)
Re: spark-packages, Spark does include important and mature functionality in its assembly e.g. Csv parser was merged into core spark when it matured. I believe Z should do the same.

Sent from my iPhone

On Jan 17, 2017, at 8:05 PM, Jeff Zhang <[hidden email]> wrote:


Another thing I'd like to talk is that should we move most of interpreters out of zeppelin project to somewhere else just like spark do for spark-packages, 2 benefits:

1. Keep the zeppelin project much smaller
2. Each interpreter's improvements won't be blocked by the release of zeppelin. Interpreters can has its own release cycle as long as zeppelin-interpreter doesn't break the compatibility. 

If it make sense, I can open another thread to discuss it.




Jun Kim <[hidden email]>于2017年1月18日周三 上午11:55写道:
+1 for Jeff's idea! I also use the three interpreters mainly :)

2017년 1월 18일 (수) 오후 12:52, Jeff Zhang <[hidden email]>님이 작성:

How about also include markdown and jdbc interpreter if this won't cause binary distribution much bigger ? I guess spark, markdown, and jdbc interpreters are the top 3 interpreters in zeppelin.



Ahyoung Ryu <[hidden email]>于2017年1月18日周三 上午11:33写道:
Thanks Mina always! 
+1 for releasing only netinst package.

On Wed, Jan 18, 2017 at 12:29 PM, Prabhjyot Singh <[hidden email]> wrote:
+1

I don't think it's a problem now, but if it keeps increasing then in the subsequent releases we can ship Zeppelin with few interpreters, and mark others as plugins that can be downloaded later with instructions with how to configure.

On Jan 18, 2017 8:54 AM, "Jun Kim" <[hidden email]> wrote:
+1

I think it won't be a problem if we notice it clear.
Maybe we can do that next to the download button here (http://zeppelin.apache.org/download.html)
A message may be "NOTE: only spark interpreter included since 0.7.0. If you want other interpreters, please see interpreter installation guide"

2017년 1월 18일 (수) 오후 12:14, Jeff Zhang <[hidden email]>님이 작성:

+1, we should also mention it in release note and in the 0.7 doc



Mina Lee <[hidden email]>于2017年1月18日周三 上午11:12写道:
Hi all,

Zeppelin is about to start 0.7.0 release process, I would like to discuss about binary package distribution.

Every time we distribute new binary package, size of the zeppelin-0.x.x-bin-all.tgz package is getting bigger:
   - zeppelin-0.6.0-bin-all.tgz: 506M
   - zeppelin-0.6.1-bin-all.tgz: 517M
   - zeppelin-0.6.2-bin-all.tgz: 547M
   - zeppelin-0.7.0-bin-all.tgz: 720M (Expected)

Mostly it is because the number of interpreters supported by zeppelin keeps growing,
and there is high chance that we support more interpreters in the near future.
So instead of asking apache infra team to increase limit,
I would like to suggest to have only zeppelin-0.7.0-bin-netinst.tgz, which only includes spark interpreter from 0.7.0 release.
One concern is that users need one more step to install the interpreters they use,
but I believe it can be done easily with single line of command [1].

FYI, attaching the link of similar discussion [2] we had last June in mailing list.

Regards,
Mina

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul

--
Taejun Kim

Data Mining Lab.
School of Electrical and Computer Engineering
University of Seoul


_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.




--
이종열, Jongyoul Lee, 李宗烈