Does do statistical calculation on each subset of data separately ?

Hi,

When I read the source code of bokeh, I found that it seems bokeh will do statistical calculation on each subset of data separately. e.g. If it is Bar chart, and I specify the label, first the panda dataframe will be split into each subset by label,
then the calculation is applied on each subset no matter it is mean/sum/count or whatever. Do I read it correctly ? My concern is that it might be more performant if we do it on the original whole data frame since the calculation on each subset is the same. Thanks for any help in advance.

Hi,

This might be true. Unfortunately bokeh.charts does not currently have a maintainer, so I can't say much about when it might see improvement. In fact, bokeh.charts is probably going to be split off into a separate project so that the community can maintain more easily. We will have more to say about this and other possibilities in the near future.

In the mean time my general recommendation is to stick with the bokeh.plotting API for most things for now. It's a little bit lower level (i.e. stats are up to you to compute) but it is stable and well-documented for several years now.

Thanks,

Bryan

···

On Feb 24, 2017, at 01:22, Jeff Zhang <[email protected]> wrote:

Hi,

When I read the source code of bokeh, I found that it seems bokeh will do statistical calculation on each subset of data separately. e.g. If it is Bar chart, and I specify the label, first the panda dataframe will be split into each subset by label,
then the calculation is applied on each subset no matter it is mean/sum/count or whatever. Do I read it correctly ? My concern is that it might be more performant if we do it on the original whole data frame since the calculation on each subset is the same. Thanks for any help in advance.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/141a2d17-8753-4ab5-983e-480389d7169f%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Thanks Bryan, is there any concrete plan or ETA for the bokeh charts splitting off ? I am investigating how to integrate spark dataframe into bokeh (spark dataframe is similar to pandas dataframe, but is distributed). For now, seems it is easier and appropriate to integrate spark dataframe into bokeh chart.

Bryan Van de ven [email protected]于2017年3月11日周六 下午12:08写道:

···

Hi,

This might be true. Unfortunately bokeh.charts does not currently have a maintainer, so I can’t say much about when it might see improvement. In fact, bokeh.charts is probably going to be split off into a separate project so that the community can maintain more easily. We will have more to say about this and other possibilities in the near future.

In the mean time my general recommendation is to stick with the bokeh.plotting API for most things for now. It’s a little bit lower level (i.e. stats are up to you to compute) but it is stable and well-documented for several years now.

Thanks,

Bryan

On Feb 24, 2017, at 01:22, Jeff Zhang [email protected] wrote:

Hi,

When I read the source code of bokeh, I found that it seems bokeh will do statistical calculation on each subset of data separately. e.g. If it is Bar chart, and I specify the label, first the panda dataframe will be split into each subset by label,

then the calculation is applied on each subset no matter it is mean/sum/count or whatever. Do I read it correctly ? My concern is that it might be more performant if we do it on the original whole data frame since the calculation on each subset is the same. Thanks for any help in advance.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/141a2d17-8753-4ab5-983e-480389d7169f%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/993E6774-F604-4DC3-8191-6C7602A39F62%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi,

Tentatively I would like to split bokeh.charts into a new project/repository for the 0.12.6 release. At that time the new bkcharts package would be listed as a dependency of the main bokeh library, and a stub "bokeh.charts" module would transitively import all the existing top-level functions from bkcharts. So, all existing usage should still function as-is, except with a deprecation warning directing people to install and use "bkcharts" directly. I would expect things to remain this way until 1.0, at which time the main bokeh project would drop the package dependency on bkcharts, and then stub "bokeh.charts" module would also be removed.

If I can offer some unsolicited advice, I would suggest not building new things on top of bokeh.charts directly. The main reason being that bokeh.models and bokeh.plotting are both fairly rock solid stable, supported, and documented and demonstrated at this point. Either of them is a much better foundation for *building* on. Making a histogram or boxplot, etc. from bokeh.plotting is not that much more work. There are a lot of deficiencies around bokeh.charts performance right now, basically because it tried to be extremely general (let's say *too* general). So my suggestion would be to use bokeh.plotting to create much simpler, straightforward versions of any charts you need with an interface that zeppelin users would appreciate.

Thanks,

Bryan

···

On Mar 13, 2017, at 00:56, Jeff Zhang <[email protected]> wrote:

Thanks Bryan, is there any concrete plan or ETA for the bokeh charts splitting off ? I am investigating how to integrate spark dataframe into bokeh (spark dataframe is similar to pandas dataframe, but is distributed). For now, seems it is easier and appropriate to integrate spark dataframe into bokeh chart.

Bryan Van de ven <[email protected]>于2017年3月11日周六 下午12:08写道:
Hi,

This might be true. Unfortunately bokeh.charts does not currently have a maintainer, so I can't say much about when it might see improvement. In fact, bokeh.charts is probably going to be split off into a separate project so that the community can maintain more easily. We will have more to say about this and other possibilities in the near future.

In the mean time my general recommendation is to stick with the bokeh.plotting API for most things for now. It's a little bit lower level (i.e. stats are up to you to compute) but it is stable and well-documented for several years now.

Thanks,

Bryan

> On Feb 24, 2017, at 01:22, Jeff Zhang <[email protected]> wrote:
>
> Hi,
>
> When I read the source code of bokeh, I found that it seems bokeh will do statistical calculation on each subset of data separately. e.g. If it is Bar chart, and I specify the label, first the panda dataframe will be split into each subset by label,
> then the calculation is applied on each subset no matter it is mean/sum/count or whatever. Do I read it correctly ? My concern is that it might be more performant if we do it on the original whole data frame since the calculation on each subset is the same. Thanks for any help in advance.
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/141a2d17-8753-4ab5-983e-480389d7169f%40continuum.io.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/993E6774-F604-4DC3-8191-6C7602A39F62%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAADy7x75DYda6i1ksibsWD8j-ec9C5AfChuZD%2B0t0QCNHBtikQ%40mail.gmail.com.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Thanks for the information and advice, Bryan

Bryan Van de ven [email protected]于2017年3月13日周一 下午10:35写道:

···

Hi,

Tentatively I would like to split bokeh.charts into a new project/repository for the 0.12.6 release. At that time the new bkcharts package would be listed as a dependency of the main bokeh library, and a stub “bokeh.charts” module would transitively import all the existing top-level functions from bkcharts. So, all existing usage should still function as-is, except with a deprecation warning directing people to install and use “bkcharts” directly. I would expect things to remain this way until 1.0, at which time the main bokeh project would drop the package dependency on bkcharts, and then stub “bokeh.charts” module would also be removed.

If I can offer some unsolicited advice, I would suggest not building new things on top of bokeh.charts directly. The main reason being that bokeh.models and bokeh.plotting are both fairly rock solid stable, supported, and documented and demonstrated at this point. Either of them is a much better foundation for building on. Making a histogram or boxplot, etc. from bokeh.plotting is not that much more work. There are a lot of deficiencies around bokeh.charts performance right now, basically because it tried to be extremely general (let’s say too general). So my suggestion would be to use bokeh.plotting to create much simpler, straightforward versions of any charts you need with an interface that zeppelin users would appreciate.

Thanks,

Bryan

On Mar 13, 2017, at 00:56, Jeff Zhang [email protected] wrote:

Thanks Bryan, is there any concrete plan or ETA for the bokeh charts splitting off ? I am investigating how to integrate spark dataframe into bokeh (spark dataframe is similar to pandas dataframe, but is distributed). For now, seems it is easier and appropriate to integrate spark dataframe into bokeh chart.

Bryan Van de ven [email protected]于2017年3月11日周六 下午12:08写道:

Hi,

This might be true. Unfortunately bokeh.charts does not currently have a maintainer, so I can’t say much about when it might see improvement. In fact, bokeh.charts is probably going to be split off into a separate project so that the community can maintain more easily. We will have more to say about this and other possibilities in the near future.

In the mean time my general recommendation is to stick with the bokeh.plotting API for most things for now. It’s a little bit lower level (i.e. stats are up to you to compute) but it is stable and well-documented for several years now.

Thanks,

Bryan

On Feb 24, 2017, at 01:22, Jeff Zhang [email protected] wrote:

Hi,

When I read the source code of bokeh, I found that it seems bokeh will do statistical calculation on each subset of data separately. e.g. If it is Bar chart, and I specify the label, first the panda dataframe will be split into each subset by label,

then the calculation is applied on each subset no matter it is mean/sum/count or whatever. Do I read it correctly ? My concern is that it might be more performant if we do it on the original whole data frame since the calculation on each subset is the same. Thanks for any help in advance.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/141a2d17-8753-4ab5-983e-480389d7169f%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/993E6774-F604-4DC3-8191-6C7602A39F62%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAADy7x75DYda6i1ksibsWD8j-ec9C5AfChuZD%2B0t0QCNHBtikQ%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/8B5B07A6-2132-4ABB-A110-F8C4CB82B027%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.