Filter dataframe before Bokeh visualization

Hi all,

I’m looking for some guidance on how to structure a project I’m looking to do. Hopefully you can give me some tools in how to efficiently tackle this issue without having to rewrite too much Python code.

Current situation:
I have one initial (source) dataset, structured as follows:

ID | Age | Gender | Date | Question 1 |

1 | 21 | Male | 02-03-2020 | 15
1 | 21 | Male | 01-04-2020 | 10
2 | 45 | Female | 02-04-2020 | 8
3 | 65 | Male | 02-03-2020 | 4
4 | 18 | Female | 02-05-2020 | 8
4 | 18 | Female | 21-03-2020 | 12
5 | 56 | Male | 28-11-2020 | 33
6 | 28 | Male | 24-09-2020 | 4
… and so on

Based on this dataset, using various lines of code, I aggregate some data into a new dataset with the following structure:

Month | Average of Question 1

March | 13.5
April | 9
May | 8
November | 33
September | 4

This is the dataset that I initially load into a dataframe, and this is also the dataframe that I am using in a Bokeh based dashboard.

In this dashboard, I create a bar plot (x-axis month, y-axis average of question 1) based on the second dataframe.

Required situation:
In the dashboard, I would like the user to be able to filter the data, using checkboxes/sliders/dropdowns. An example:
The dashboard needs to have a dropdown where you can select the gender. Selecting gender “Male” would then have to filter down the initial (=source) dataset to only show the rows where Gender = Male. Consecutively, the second dataframe (the one that’s used for the data visualization in Bokeh) should then be aggregated based on this filtered source dataset. This means that the barplot should then also be changed accordingly.

For most individual steps I know how to execute these, but I am having difficulties in finding out the best way to structure this project. This is just an example, but I have many more separate datasets that are all using the same source datasets. So I am looking for an approach that is efficient to implement :slight_smile: What would be a good way to approach this?

Many thanks in advance!


Hi all,

I was hoping that someone would be able to point me in the right direction. Is there anyone with a suggestion on how to set this up? :slight_smile:


HI @jmmt Generally speaking, it is much easier for folks here to provide support when starting from some actual code. Otherwise, there is a risk of spending time to write up an example or response, only to find out you answered the wrong questions or tackled a different problem then was intended, and speaking plainly, that risk is a huge dis-incentive to engage with questions. Actual code always helps clarify things and remove the need for speculation.

So, my meta suggestion is this: try out and run the the Bokeh server app examples [1] in bokeh/examples/app at branch-2.4 · bokeh/bokeh · GitHub and look for ones that do things in the ballpark of what you are trying to do, and then formulate some specific, narrower questions around those. Ideally, try to create a toy example that does some of what you want, then we can help with specific problems you encounter. The docs at Running a Bokeh server — Bokeh 2.3.0 Documentation will also be useful to refer to.

  1. Assuming Bokeh server app because if you want to use things like DataFrames directly in callbacks/updates then a Bokeh server app is the only option. ↩︎