How to build a pivot table in a standalone file?

Al7 · June 14, 2022, 1:13pm

Hello everyone,

Simply I want to create a pivot table in a standalone file, no servers or outside help. Of course the values of the pivot table will be taken from the selected CDS.

How can I get around this? Extending Bokeh is extremely overwhelming for me to understand at this point. Baby steps would be appreciated.

gmerritt123 · June 14, 2022, 1:23pm

A pivot table can mean a lot of things, the biggest piece is if you need aggregation (in the event you have duplicate indices) or not (i.e. every index is unique and you’re just basically transposing). Can you provide a Minimal Reproducible Example that has a CDS with data, and a description of what you’d like the pivot piece to do?

Al7 · June 14, 2022, 3:42pm

Thank you for your reply. I have used auto-mpg2 sample data from bokeh. The expected results-- loaded using excel pivot table, should be:

Row Labels	Average of cyl	Average of cty	Average of hwy	Count of model
2seater	8	15.4	24.8	5
compact	4.595744681	20.12765957	28.29787234	47
midsize	5.317073171	18.75609756	27.29268293	41
minivan	5.818181818	15.81818182	22.36363636	11
pickup	7.03030303	13	16.87878788	33
subcompact	5.028571429	20.37142857	28.14285714	35
suv	6.967741935	13.5	18.12903226	62
Grand Total	5.888888889	16.85897436	23.44017094	234

Now let’s say I have a multi-select with manufacturer as option, once I deselect one of them The above table should be updated.

Below table is after I removed Audi:

Row Labels	Average of cyl	Average of cty	Average of hwy	Count of model
2seater	8	15.4	24.8	5
compact	4.4375	21.15625	28.9375	32
midsize	5.210526316	18.97368421	27.55263158	38
minivan	5.818181818	15.81818182	22.36363636	11
pickup	7.03030303	13	16.87878788	33
subcompact	5.028571429	20.37142857	28.14285714	35
suv	6.967741935	13.5	18.12903226	62
Grand Total	5.944444444	16.7962963	23.18981481	216

Or selecting only Audi:

Row Labels	Average of cyl	Average of cty	Average of hwy	Count of model
compact	4.933333333	17.93333333	26.93333333	15
midsize	6.666666667	16	24	3
Grand Total	5.222222222	17.61111111	26.44444444	18

The columns themselves can be predefined, it doesn’t have to be dynamic with field selection options like excel.

gmerritt123 · June 14, 2022, 4:54pm

Right so you are predefining columns, predefining the aggregation (3 averages and a count). Are you predefining fields to groupby (in your example you are grouping by car type - compact/midsize etc). Are you only providing one groupby field at a time (e.g. only grouping data by car type, and never grouping by car type AND number of cylinders)?

Al7 · June 14, 2022, 4:59pm

I would love to have the option to group by two fields, and yes I will be predefining both groupby fields.

Thank you so much for all the help and support @gmerritt123

gmerritt123 · June 14, 2022, 6:33pm

Ok. I may not have bandwidth to write this all out explicitly for you but I can give some guidance on logic/what you need to implement.

First of all, you absolutely can do this with bokeh and it’s awesome Secondly though, it’s gonna take more effort than you probably want, primarily due to the standalone requirement. To build standalone you will need CustomJS, and you essentially need CustomJS that will do the pseudocode “SQL-ish” equivalent of:

select avg(fieldyouwantaverage), count(fieldyouwanttocount) --i.e. all your aggregations
from your CDS where CDS indices are selected --i.e. everything from your CDS that is currently selected
group by [fieldyouwanttogroupby],[anotherfieldyouwanttogroupby] --i.e. doing the aggregation in groups  as per certain fields

I would avoid writing vanilla JS code to do this, but rather look to leveraging existing JS libraries. d3 (d3.group, d3.rollup, d3.index / D3 / Observable) can do this, and maybe simpler but not as robust (e.g. no custom aggregation etc) alasql (http://alasql.org/) can also.

The overall logic would be

store your data in a CDS (i.e. the “primary” datasource)
instantiate another CDS (i.e. the “aggresult” datasource), with column names equivalent to what you want your pivot table to be and empty lists as their initial values
plot your “primary” CDS with the user ability to select it (lasso select etc.)
create a table with columns, running off off your “aggresult” datasource
write a callback that will:
a) complete the above outlined pseudo-sql logic
b) update the “aggresult” datasource with that result

I’ve built in essence the same thing for evaluating calibration statistics (i.e. goodness of model fit) over an arbitrary user-defined selection, grouping by multiple fields etc and presenting results in different tables on the fly:

calib_example

I did this with d3 because the aggregation functions are custom, i.e. not just “AVG” “COUNT” etc. It took a fair bit of work and learning but payoff was worth it Good luck!

Al7 · June 14, 2022, 7:40pm

Thank you heaps, I actually thought of doing it as you described it but all I know is the js used in the documentation and examples.

I have actually started doing what you outlined, where I did the following:

original CDS for the plotted data
instantiate an empty CDS for the tables
in a callback: store the relevant indices of the original CDS
do the necessary calculations on the rows of the original CDS at the saved indices and store them into variables
push the variables into the empty CDS to be viewed as the rows for the table

I halted my work due to my sub-basic knowledge in JS, and thought there might be a more elegant and direct way around it. I couldn’t figure out how to groupby the data based on two fields and how can I do the calculations? Should I have a variable for each element of the grouped-by fields? i.e. 2searter,compact,midsize…etc? so if i were to use the manufacturer I’d have to instantiate 16 variables?

Finally, is it possible to use D3 functions inside bokeh callbacks?

gmerritt123 · June 14, 2022, 7:55pm

Sounds like you’re on the right track.

See this example of how I “import” additional JS Resources into a bokeh html output so you have access to them in callbacks:

The rest is just google-fu and trial and error/riding the struggle bus on the JS side. D3 is an insanely powerful library but my pandas/sql-based intuition/thought process is repeatedly shot down by it , that’s why I suggested trying alasql which would let you basically write basic SQL directly in the callback.

system · September 12, 2022, 7:55pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.