[Proposal] Zeppelin Client API (Zeppelin SDK)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[Proposal] Zeppelin Client API (Zeppelin SDK)

Jeff Zhang
Hi Folks,

I'd like to discuss this proposal with you about the zeppelin client api (zeppelin sdk).
The background is that now Zeppelin’s main usage scenario is interactive data analysis. Although it provides rest api, it is not easy for an external system (e.g. scheduler system) to integrate Zeppelin for the scenario where zeppelin is used as a backend job service. So I propose to introduce a new module: Zeppelin client api (Zeppelin SDK), whose purpose is to provide easy api for external systems to integrate zeppelin.

I have created a google doc for the details, welcome any comments and feedback.



--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Zeppelin Client API (Zeppelin SDK)

Eric Pugh
:-).  That makes sense to me, just that I feel like the Low and High words make it seem like they are for the same scenario, just differing levels of knowledge needed by the user.  

Having said that, I do really like the idea of letting Zeppelin solve more use cases!


On Jul 25, 2020, at 9:43 AM, Jeff Zhang <[hidden email]> wrote:

Hi Eric,

Thanks for your feedback. I name them as high level api and low level api just because the high level depends on the low level api. Actually the high level api and low level api are for difference scenarios. The low level api is for the scenario that user write code in notebook and would like to schedule the note via external system. The high level api is for the scenario that user treat zeppelin as a job server, and user don't need to write code in zeppelin beforehand, they just submit code to zeppelin and zeppelin would execute the code. This is much like the apache livy, https://livy.apache.org/

Eric Pugh <[hidden email]> 于2020年7月25日周六 上午12:45写道:
Thanks Jeff for sharing this.   I’ve often wanted to take what I did in my notebook, and then make that logic flow something that could be triggered by other processes.   I used the CRON feature that was available always in the 8.x line of Zeppelin, and had the end of my notebook be a HTTP PUT with my output calculations for example ;-).

I’m not sure about the term High and Low.   The other project that I saw that used those terms was Apache Poi, and they had a high-level API about a Excel spreadsheet, that abstracted a lot away, and then a very low-level one where you were working with, and then a low level one where you worked with the basic datastructures.

I would think that “High Level” is the working with notebooks and paragraphs, but without really knowing what was going on inside of them.  I interact with Notebook X and Notebook Y, but they are blackboxes to me.   Whereas the “Low Level” would be the “I am actually running code against Zeppelin, and understand how to run code on Zeppelin”.

I know this is the opposite of your definition!

Regardless of naming, more ways to leverage Zeppelin would be nice.  



> On Jul 24, 2020, at 11:53 AM, Jeff Zhang <[hidden email]> wrote:
> 
> Hi Folks,
> 
> I'd like to discuss this proposal with you about the zeppelin client api (zeppelin sdk).
> The background is that now Zeppelin’s main usage scenario is interactive data analysis. Although it provides rest api, it is not easy for an external system (e.g. scheduler system) to integrate Zeppelin for the scenario where zeppelin is used as a backend job service. So I propose to introduce a new module: Zeppelin client api (Zeppelin SDK), whose purpose is to provide easy api for external systems to integrate zeppelin.
> 
> I have created a google doc for the details, welcome any comments and feedback.
> 
> https://docs.google.com/document/d/1bLLKKxleZlZpP9EFJlLLkJKwDBps-RNvzNwh3LFZWZ4/edit?usp=sharing <https://docs.google.com/document/d/1bLLKKxleZlZpP9EFJlLLkJKwDBps-RNvzNwh3LFZWZ4/edit?usp=sharing>
> 
> 
> -- 
> Best Regards
> 
> Jeff Zhang

_______________________
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal 
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>    
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.



-- 
Best Regards

Jeff Zhang

_______________________
Eric Pugh Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com | My Free/Busy  
This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such.

Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Zeppelin Client API (Zeppelin SDK)

Jeff Zhang
In reply to this post by Jeff Zhang
Thanks for your feedback, I have created ticket https://issues.apache.org/jira/browse/ZEPPELIN-4981

Just like [hidden email]  mentioned, this api may need several iterations, so I plan to make it as an experimental feature first, and then refine the api based on user feedback.

Alex Ott <[hidden email]> 于2020年7月31日周五 下午7:48写道:
Idea is very good - I think that we'll need to get several iterations of
API refinement, but current approach looks promising.

On Fri, Jul 24, 2020 at 5:54 PM Jeff Zhang <[hidden email]> wrote:

> Hi Folks,
>
> I'd like to discuss this proposal with you about the zeppelin client api
> (zeppelin sdk).
> The background is that now Zeppelin’s main usage scenario is interactive
> data analysis. Although it provides rest api, it is not easy for an
> external system (e.g. scheduler system) to integrate Zeppelin for the
> scenario where zeppelin is used as a backend job service. So I propose to
> introduce a new module: Zeppelin client api (Zeppelin SDK), whose purpose
> is to provide easy api for external systems to integrate zeppelin.
>
> I have created a google doc for the details, welcome any comments and
> feedback.
>
>
> https://docs.google.com/document/d/1bLLKKxleZlZpP9EFJlLLkJKwDBps-RNvzNwh3LFZWZ4/edit?usp=sharing
>
>
> --
> Best Regards
>
> Jeff Zhang
>


--
With best wishes,                    Alex Ott
http://alexott.net/
Twitter: alexott_en (English), alexott (Russian)


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Zeppelin Client API (Zeppelin SDK)

Jeff Zhang
Good idea, Moon, I think cli would be a good start for users. This kind of cli is a super combination of spark-shell, hive beeline, flink scala shell, python shell.

e.g.
>>> %spark sc.version
>>> %hive show tables
>>> %python print('hello world')



moon soo Lee <[hidden email]> 于2020年7月31日周五 下午10:55写道:
Looks really good!
Do you think to have a CLI on top of high-level API be also useful?


On Fri, Jul 31, 2020 at 7:13 AM Jeff Zhang <[hidden email]> wrote:

> Thanks for your feedback, I have created ticket
> https://issues.apache.org/jira/browse/ZEPPELIN-4981
>
> Just like @Alex Ott <[hidden email]>  mentioned, this api may need
> several iterations, so I plan to make it as an experimental feature
> first, and then refine the api based on user feedback.
> .
>
> Alex Ott <[hidden email]> 于2020年7月31日周五 下午7:48写道:
>
>> Idea is very good - I think that we'll need to get several iterations of
>> API refinement, but current approach looks promising.
>>
>> On Fri, Jul 24, 2020 at 5:54 PM Jeff Zhang <[hidden email]> wrote:
>>
>> > Hi Folks,
>> >
>> > I'd like to discuss this proposal with you about the zeppelin client api
>> > (zeppelin sdk).
>> > The background is that now Zeppelin’s main usage scenario is interactive
>> > data analysis. Although it provides rest api, it is not easy for an
>> > external system (e.g. scheduler system) to integrate Zeppelin for the
>> > scenario where zeppelin is used as a backend job service. So I propose
>> to
>> > introduce a new module: Zeppelin client api (Zeppelin SDK), whose
>> purpose
>> > is to provide easy api for external systems to integrate zeppelin.
>> >
>> > I have created a google doc for the details, welcome any comments and
>> > feedback.
>> >
>> >
>> >
>> https://docs.google.com/document/d/1bLLKKxleZlZpP9EFJlLLkJKwDBps-RNvzNwh3LFZWZ4/edit?usp=sharing
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>> >
>>
>>
>> --
>> With best wishes,                    Alex Ott
>> http://alexott.net/
>> Twitter: alexott_en (English), alexott (Russian)
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: [Proposal] Zeppelin Client API (Zeppelin SDK)

Jongyoul Lee
Sounds good to me as well.

BTW, don't we have to change some web socket code as well? <- It's just an idea.

I would like to see PoC for it and I'm willing to help to improve it because my company needs it too.

2020년 8월 1일 (토) 오전 12:05, Jeff Zhang <[hidden email]>님이 작성:
Good idea, Moon, I think cli would be a good start for users. This kind of
cli is a super combination of spark-shell, hive beeline, flink scala shell,
python shell.

e.g.
>>> %spark sc.version
>>> %hive show tables
>>> %python print('hello world')



moon soo Lee <[hidden email]> 于2020年7月31日周五 下午10:55写道:

> Looks really good!
> Do you think to have a CLI on top of high-level API be also useful?
>
>
> On Fri, Jul 31, 2020 at 7:13 AM Jeff Zhang <[hidden email]> wrote:
>
> > Thanks for your feedback, I have created ticket
> > https://issues.apache.org/jira/browse/ZEPPELIN-4981
> >
> > Just like @Alex Ott <[hidden email]>  mentioned, this api may need
> > several iterations, so I plan to make it as an experimental feature
> > first, and then refine the api based on user feedback.
> > .
> >
> > Alex Ott <[hidden email]> 于2020年7月31日周五 下午7:48写道:
> >
> >> Idea is very good - I think that we'll need to get several iterations of
> >> API refinement, but current approach looks promising.
> >>
> >> On Fri, Jul 24, 2020 at 5:54 PM Jeff Zhang <[hidden email]> wrote:
> >>
> >> > Hi Folks,
> >> >
> >> > I'd like to discuss this proposal with you about the zeppelin client
> api
> >> > (zeppelin sdk).
> >> > The background is that now Zeppelin’s main usage scenario is
> interactive
> >> > data analysis. Although it provides rest api, it is not easy for an
> >> > external system (e.g. scheduler system) to integrate Zeppelin for the
> >> > scenario where zeppelin is used as a backend job service. So I propose
> >> to
> >> > introduce a new module: Zeppelin client api (Zeppelin SDK), whose
> >> purpose
> >> > is to provide easy api for external systems to integrate zeppelin.
> >> >
> >> > I have created a google doc for the details, welcome any comments and
> >> > feedback.
> >> >
> >> >
> >> >
> >>
> https://docs.google.com/document/d/1bLLKKxleZlZpP9EFJlLLkJKwDBps-RNvzNwh3LFZWZ4/edit?usp=sharing
> >> >
> >> >
> >> > --
> >> > Best Regards
> >> >
> >> > Jeff Zhang
> >> >
> >>
> >>
> >> --
> >> With best wishes,                    Alex Ott
> >> http://alexott.net/
> >> Twitter: alexott_en (English), alexott (Russian)
> >>
> >
> >
> > --
> > Best Regards
> >
> > Jeff Zhang
> >
>


--
Best Regards

Jeff Zhang


--
이종열, Jongyoul Lee, 李宗烈