I’m looking for some guidance on how to structure a project I’m looking to do. Hopefully you can give me some tools in how to efficiently tackle this issue without having to rewrite too much Python code.
I have one initial (source) dataset, structured as follows:
ID | Age | Gender | Date | Question 1 |
1 | 21 | Male | 02-03-2020 | 15
1 | 21 | Male | 01-04-2020 | 10
2 | 45 | Female | 02-04-2020 | 8
3 | 65 | Male | 02-03-2020 | 4
4 | 18 | Female | 02-05-2020 | 8
4 | 18 | Female | 21-03-2020 | 12
5 | 56 | Male | 28-11-2020 | 33
6 | 28 | Male | 24-09-2020 | 4
… and so on
Based on this dataset, using various lines of code, I aggregate some data into a new dataset with the following structure:
Month | Average of Question 1
March | 13.5
April | 9
May | 8
November | 33
September | 4
This is the dataset that I initially load into a dataframe, and this is also the dataframe that I am using in a Bokeh based dashboard.
In this dashboard, I create a bar plot (x-axis month, y-axis average of question 1) based on the second dataframe.
In the dashboard, I would like the user to be able to filter the data, using checkboxes/sliders/dropdowns. An example:
The dashboard needs to have a dropdown where you can select the gender. Selecting gender “Male” would then have to filter down the initial (=source) dataset to only show the rows where Gender = Male. Consecutively, the second dataframe (the one that’s used for the data visualization in Bokeh) should then be aggregated based on this filtered source dataset. This means that the barplot should then also be changed accordingly.
For most individual steps I know how to execute these, but I am having difficulties in finding out the best way to structure this project. This is just an example, but I have many more separate datasets that are all using the same source datasets. So I am looking for an approach that is efficient to implement What would be a good way to approach this?
Many thanks in advance!