Notebook Storage and Git

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Notebook Storage and Git

Tw UxTLi51Nus
Hi,

not sure if I should write this on dev@, but I thought I'll give it a
try here first ...

I am using Zeppelin with Git controlled notebook storage. However, I
find the "git client" integrated in Zeppelin quite rudimentary. So I do
most of the VCS stuff via the CLI.

Two things are bothering me:

1) the naming scheme
On the file system, the notebooks are named with some random names
(well, the folders, the notebooks itself are all note.json). Wouldn't it
be better to reflect the structure of the notebooks in Zeppelin also on
the file system, e.g. a notebook named "nbfolder1/nbfolder2/nb1" is on
the file system on "NOTEBOOK-STORAGE/nbfolder1/nbfolder2/nb1.json" ?
Was this or something similar discussed / discarded at some point? If
discarded, why?

2) The notebooks containing the results
... this leads to a change in the note.json files when the notebook is
run again, even when the "code" itself has not changed, which makes
comparing diffs really difficult. Why not use a second file (e.g.
notebook_results.json) to store the results and thus have a "clean"
notebook file to put into VC?

Thanks,

--
Tw UxTLi51Nus
Email: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Notebook Storage and Git

moon
Administrator
Hi,

There's a related issue https://issues.apache.org/jira/browse/ZEPPELIN-2702 for naming scheme.

I think Version Control System friendly notebook file format is interesting subject to discuss. related issue is https://issues.apache.org/jira/browse/ZEPPELIN-451

Thanks,
moon

On Thu, Jun 29, 2017 at 5:51 PM Tw UxTLi51Nus <[hidden email]> wrote:
Hi,

not sure if I should write this on dev@, but I thought I'll give it a
try here first ...

I am using Zeppelin with Git controlled notebook storage. However, I
find the "git client" integrated in Zeppelin quite rudimentary. So I do
most of the VCS stuff via the CLI.

Two things are bothering me:

1) the naming scheme
On the file system, the notebooks are named with some random names
(well, the folders, the notebooks itself are all note.json). Wouldn't it
be better to reflect the structure of the notebooks in Zeppelin also on
the file system, e.g. a notebook named "nbfolder1/nbfolder2/nb1" is on
the file system on "NOTEBOOK-STORAGE/nbfolder1/nbfolder2/nb1.json" ?
Was this or something similar discussed / discarded at some point? If
discarded, why?

2) The notebooks containing the results
... this leads to a change in the note.json files when the notebook is
run again, even when the "code" itself has not changed, which makes
comparing diffs really difficult. Why not use a second file (e.g.
notebook_results.json) to store the results and thus have a "clean"
notebook file to put into VC?

Thanks,

--
Tw UxTLi51Nus
Email: [hidden email]
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Notebook Storage and Git

Tw UxTLi51Nus
Hi,

> I think Version Control System friendly notebook file format is interesting
> subject to discuss. related issue is
> https://issues.apache.org/jira/browse/ZEPPELIN-451

yeah, I think that too. Should I add a comment to the mentioned issue? Or
where would you suggest starting this discussion?

THX


On Thursday, June 29, 2017 11:00:11 AM CEST moon soo Lee wrote:

> Hi,
>
> There's a related issue https://issues.apache.org/jira/browse/ZEPPELIN-2702
> for naming scheme.
>
> I think Version Control System friendly notebook file format is interesting
> subject to discuss. related issue is
> https://issues.apache.org/jira/browse/ZEPPELIN-451
>
> Thanks,
> moon
>
> On Thu, Jun 29, 2017 at 5:51 PM Tw UxTLi51Nus <[hidden email]>
>
> wrote:
> > Hi,
> >
> > not sure if I should write this on dev@, but I thought I'll give it a
> > try here first ...
> >
> > I am using Zeppelin with Git controlled notebook storage. However, I
> > find the "git client" integrated in Zeppelin quite rudimentary. So I do
> > most of the VCS stuff via the CLI.
> >
> > Two things are bothering me:
> >
> > 1) the naming scheme
> > On the file system, the notebooks are named with some random names
> > (well, the folders, the notebooks itself are all note.json). Wouldn't it
> > be better to reflect the structure of the notebooks in Zeppelin also on
> > the file system, e.g. a notebook named "nbfolder1/nbfolder2/nb1" is on
> > the file system on "NOTEBOOK-STORAGE/nbfolder1/nbfolder2/nb1.json" ?
> > Was this or something similar discussed / discarded at some point? If
> > discarded, why?
> >
> > 2) The notebooks containing the results
> > ... this leads to a change in the note.json files when the notebook is
> > run again, even when the "code" itself has not changed, which makes
> > comparing diffs really difficult. Why not use a second file (e.g.
> > notebook_results.json) to store the results and thus have a "clean"
> > notebook file to put into VC?
> >
> > Thanks,
> >
> > --
> > Tw UxTLi51Nus
> > Email: [hidden email]


--
Tw UxTLi51Nus
Email: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Notebook Storage and Git

moon
Administrator
Please feel free to comment issues, related PullRequests or continue the discussion on the mailing list.

Thanks!
moon

On Fri, Jul 14, 2017 at 4:35 AM Tw UxTLi51Nus <[hidden email]> wrote:
Hi,

> I think Version Control System friendly notebook file format is interesting
> subject to discuss. related issue is
> https://issues.apache.org/jira/browse/ZEPPELIN-451

yeah, I think that too. Should I add a comment to the mentioned issue? Or
where would you suggest starting this discussion?

THX


On Thursday, June 29, 2017 11:00:11 AM CEST moon soo Lee wrote:
> Hi,
>
> There's a related issue https://issues.apache.org/jira/browse/ZEPPELIN-2702
> for naming scheme.
>
> I think Version Control System friendly notebook file format is interesting
> subject to discuss. related issue is
> https://issues.apache.org/jira/browse/ZEPPELIN-451
>
> Thanks,
> moon
>
> On Thu, Jun 29, 2017 at 5:51 PM Tw UxTLi51Nus <[hidden email]>
>
> wrote:
> > Hi,
> >
> > not sure if I should write this on dev@, but I thought I'll give it a
> > try here first ...
> >
> > I am using Zeppelin with Git controlled notebook storage. However, I
> > find the "git client" integrated in Zeppelin quite rudimentary. So I do
> > most of the VCS stuff via the CLI.
> >
> > Two things are bothering me:
> >
> > 1) the naming scheme
> > On the file system, the notebooks are named with some random names
> > (well, the folders, the notebooks itself are all note.json). Wouldn't it
> > be better to reflect the structure of the notebooks in Zeppelin also on
> > the file system, e.g. a notebook named "nbfolder1/nbfolder2/nb1" is on
> > the file system on "NOTEBOOK-STORAGE/nbfolder1/nbfolder2/nb1.json" ?
> > Was this or something similar discussed / discarded at some point? If
> > discarded, why?
> >
> > 2) The notebooks containing the results
> > ... this leads to a change in the note.json files when the notebook is
> > run again, even when the "code" itself has not changed, which makes
> > comparing diffs really difficult. Why not use a second file (e.g.
> > notebook_results.json) to store the results and thus have a "clean"
> > notebook file to put into VC?
> >
> > Thanks,
> >
> > --
> > Tw UxTLi51Nus
> > Email: [hidden email]


--
Tw UxTLi51Nus
Email: [hidden email]

Loading...