How to build a pivot table in a standalone file?

Hello everyone,

Simply I want to create a pivot table in a standalone file, no servers or outside help. Of course the values of the pivot table will be taken from the selected CDS.

How can I get around this? Extending Bokeh is extremely overwhelming for me to understand at this point. Baby steps would be appreciated.

A pivot table can mean a lot of things, the biggest piece is if you need aggregation (in the event you have duplicate indices) or not (i.e. every index is unique and you’re just basically transposing). Can you provide a Minimal Reproducible Example that has a CDS with data, and a description of what you’d like the pivot piece to do?

Thank you for your reply. I have used auto-mpg2 sample data from bokeh. The expected results-- loaded using excel pivot table, should be:

Row Labels Average of cyl Average of cty Average of hwy Count of model
2seater 8 15.4 24.8 5
compact 4.595744681 20.12765957 28.29787234 47
midsize 5.317073171 18.75609756 27.29268293 41
minivan 5.818181818 15.81818182 22.36363636 11
pickup 7.03030303 13 16.87878788 33
subcompact 5.028571429 20.37142857 28.14285714 35
suv 6.967741935 13.5 18.12903226 62
Grand Total 5.888888889 16.85897436 23.44017094 234

Now let’s say I have a multi-select with manufacturer as option, once I deselect one of them The above table should be updated.

Below table is after I removed Audi:

Row Labels Average of cyl Average of cty Average of hwy Count of model
2seater 8 15.4 24.8 5
compact 4.4375 21.15625 28.9375 32
midsize 5.210526316 18.97368421 27.55263158 38
minivan 5.818181818 15.81818182 22.36363636 11
pickup 7.03030303 13 16.87878788 33
subcompact 5.028571429 20.37142857 28.14285714 35
suv 6.967741935 13.5 18.12903226 62
Grand Total 5.944444444 16.7962963 23.18981481 216

Or selecting only Audi:

Row Labels Average of cyl Average of cty Average of hwy Count of model
compact 4.933333333 17.93333333 26.93333333 15
midsize 6.666666667 16 24 3
Grand Total 5.222222222 17.61111111 26.44444444 18

The columns themselves can be predefined, it doesn’t have to be dynamic with field selection options like excel.

Right so you are predefining columns, predefining the aggregation (3 averages and a count). Are you predefining fields to groupby (in your example you are grouping by car type - compact/midsize etc). Are you only providing one groupby field at a time (e.g. only grouping data by car type, and never grouping by car type AND number of cylinders)?

I would love to have the option to group by two fields, and yes I will be predefining both groupby fields.

Thank you so much for all the help and support @gmerritt123

Ok. I may not have bandwidth to write this all out explicitly for you but I can give some guidance on logic/what you need to implement.

First of all, you absolutely can do this with bokeh and it’s awesome :slight_smile: Secondly though, it’s gonna take more effort than you probably want, primarily due to the standalone requirement. To build standalone you will need CustomJS, and you essentially need CustomJS that will do the pseudocode “SQL-ish” equivalent of:

select avg(fieldyouwantaverage), count(fieldyouwanttocount) --i.e. all your aggregations
from your CDS where CDS indices are selected --i.e. everything from your CDS that is currently selected
group by [fieldyouwanttogroupby],[anotherfieldyouwanttogroupby] --i.e. doing the aggregation in groups  as per certain fields

I would avoid writing vanilla JS code to do this, but rather look to leveraging existing JS libraries. d3 (d3.group, d3.rollup, d3.index / D3 / Observable) can do this, and maybe simpler but not as robust (e.g. no custom aggregation etc) alasql (http://alasql.org/) can also.

The overall logic would be

  • store your data in a CDS (i.e. the “primary” datasource)
  • instantiate another CDS (i.e. the “aggresult” datasource), with column names equivalent to what you want your pivot table to be and empty lists as their initial values
  • plot your “primary” CDS with the user ability to select it (lasso select etc.)
  • create a table with columns, running off off your “aggresult” datasource
  • write a callback that will:
    a) complete the above outlined pseudo-sql logic
    b) update the “aggresult” datasource with that result

I’ve built in essence the same thing for evaluating calibration statistics (i.e. goodness of model fit) over an arbitrary user-defined selection, grouping by multiple fields etc and presenting results in different tables on the fly:

calib_example

I did this with d3 because the aggregation functions are custom, i.e. not just “AVG” “COUNT” etc. It took a fair bit of work and learning but payoff was worth it :smiley: Good luck!

1 Like

Thank you heaps, I actually thought of doing it as you described it but all I know is the js used in the documentation and examples.

I have actually started doing what you outlined, where I did the following:

  1. original CDS for the plotted data
  2. instantiate an empty CDS for the tables
  3. in a callback: store the relevant indices of the original CDS
  4. do the necessary calculations on the rows of the original CDS at the saved indices and store them into variables
  5. push the variables into the empty CDS to be viewed as the rows for the table

I halted my work due to my sub-basic knowledge in JS, and thought there might be a more elegant and direct way around it. I couldn’t figure out how to groupby the data based on two fields and how can I do the calculations? Should I have a variable for each element of the grouped-by fields? i.e. 2searter,compact,midsize…etc? so if i were to use the manufacturer I’d have to instantiate 16 variables? :hot_face:

Finally, is it possible to use D3 functions inside bokeh callbacks?

Sounds like you’re on the right track.

See this example of how I “import” additional JS Resources into a bokeh html output so you have access to them in callbacks:

The rest is just google-fu and trial and error/riding the struggle bus on the JS side. D3 is an insanely powerful library but my pandas/sql-based intuition/thought process is repeatedly shot down by it :sweat_smile: , that’s why I suggested trying alasql which would let you basically write basic SQL directly in the callback.

1 Like