Minor documentation error for categorical bar pandas nested groupby example

Hello,

I think there is a small error in the example for categorical (bar) charts using groupby on a pandas dataframe.

In the previous example the source for the vbar is a ColumnDataSource and I think the intent is that the source for the nested example is to use a ColumnDataSource as well, but the pandas groupby object is used directly.

The example seems to still produce the desired plot, but it was unclear to me where the ‘cyl_mfr’ and ‘mpg_mean’ names came from until I changed my copy of the example to create a ColumnDataSource from the multi-level groupby object and then looked at the data attribute of the ColumnDataSource.

Hi @bt1 I would actually suggest the converse, i.e. it was a specifically added feature to allow both DataFrames and GroupBy objects to be passed directly as the source parameter as a convenience. The result is the same either way, but passing them directly saves having to explicitly import and create a ColumnDataSource and many people prefer that. I would suggest that the examples that don’t pass a DataFrame or GroupBy directly are the ones to be updated, to do so.

This would be an ideal task for a new contributor, if you would like to submit a Pull Request to update things.

FYI The documentation for what Bokeh does with GroupBy objects is here:

@Bryan, sorry my first sentence was not well worded. I did not mean that being able to use the groupby directly was an error in any way.

After re-reading the two examples and your explanation, I don’t think there is an error, but maybe there is some additional explanation that would be helpful.

After looking at this further, I found that Bokeh is using the describe method of the DataFrameGroupBy object to produce the summary statistics and then flattening the multindex in the result by joining it with an underscore. This is implied by the ‘cyl_mfr’ and the ‘mpg_mean’ names in the examples. Is that documented anywhere else? If not, I think an explicit explanation of that would be helpful to add somewhere.

If adding some explanation of this to the example is appropriate, I would be happy to make a pull request.

I just came across where the MultiIndex flattening and the group.describe() behavior is explained in the documentation:

https://docs.bokeh.org/en/latest/docs/user_guide/data.html#pandas-multiindex

https://docs.bokeh.org/en/latest/docs/user_guide/data.html#pandas-groupby

I think a link from the pandas categorical examples to these two sections of the documentation would have cleared up my confusion.

@bt1 That behavior is documented here:

Providing data — Bokeh 2.4.2 Documentation

I am sure additional links, cross references or repetitions would be helpful, so we’d certainly be glad to have an contributions from you around this.