Hello everyone,
I’m trying to make a chart which number of lines is defined by the number of different cities in a pandas.dataframe. I want to hide and show some of the lines (maybe with a multiselect control) and have the y_range
to be updated in order to fit the visible lines data. I’m able to control the hide/show part, but not the adjustment of y_range
.
Here’s a dataframe with the same structure of the one I’m dealing:
#%% Creating dataset
data = {
'city':['City-1','City-2','City-3','City-4','City-1','City-2','City-3',
'City-4','City-1','City-2','City-3','City-4','City-1','City-2',
'City-3','City-4','City-1','City-2','City-3','City-4','City-1',
'City-2','City-3','City-4','City-1','City-2','City-3','City-4'],
'value':['440770.43','48259.34','14112.54','59208.12','405397.05',
'50299.98','19374.11','73865.03','401559.71','49777.54',
'19906.50','60450.23','414458.60','50161.29','16739.60',
'61169.98','411423.85','50990.14','16025.36','63231.71',
'401162.64','51719.12','20457.94','62856.73','502449.53',
'66137.81','22318.40','87541.79'],
'date':['2022-06-01','2022-06-01','2022-06-01','2022-06-01','2022-07-01',
'2022-07-01','2022-07-01','2022-07-01','2022-08-01','2022-08-01',
'2022-08-01','2022-08-01','2022-09-01','2022-09-01','2022-09-01',
'2022-09-01','2022-10-01','2022-10-01','2022-10-01','2022-10-01',
'2022-11-01','2022-11-01','2022-11-01','2022-11-01','2022-12-01',
'2022-12-01','2022-12-01','2022-12-01']
}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df['value'] = pd.to_numeric(df['value'])
I tried a few things:
- The first attemptive was using MultiSelect control, with a CustomJS callback. I tried to use the control to hide and show the lines, and this works. The problem is with the
plot.y_range
that doesn’t update according to the visible data. At the end of the CustomJS callback, theplot.y_range.start
andplot.y_range.end
are updated, but when the callback is executed again, they_range
has the value corresponding to the entire source.
#%% Ploting using multiselect and CustomJS
multiselect = MultiSelect(title="Cities:", value=list(df['city'].unique()),
options=list(df['city'].unique()))
plot = figure(width=400, height=250, title='Total monthly value - by city',
x_axis_type='datetime')
plot.yaxis[0].formatter = NumeralTickFormatter(format="R$0,00")
colors = ['red', 'blue', 'green', 'purple', 'orange']
lines = []
for i, city in enumerate(df['city'].unique()):
df_city = df[df['city'] == city]
color = colors[i % len(colors)] # Loop through the colors list cyclically
line = plot.line(x='date', y='value', color=color, source=df_city, legend_label=city, line_width=2)
lines.append(line)
plot.legend.location = 'top_left'
plot.legend.title = 'Cities'
source = ColumnDataSource(df)
# Update the plot's data source when the MultiSelect is interacted with
multiselect.js_on_change('value', CustomJS(args=dict(plot=plot, lines=lines, source=source, multiselect=multiselect), code="""
const selected_cities = multiselect.value;
const data = source.data;
// Check if the original_data attribute exists; if not, set it to the current data
if (!this.original_data) {
this.original_data = {
date: [...data['date']],
value: [...data['value']],
city: [...data['city']]
};
}
// Filter the DataFrame on the server side using the selected cities
const filtered_data = { date: [], value: [], city: [] };
for (let i = 0; i < this.original_data['city'].length; i++) {
if (selected_cities.includes(this.original_data['city'][i])) {
filtered_data.date.push(this.original_data['date'][i]);
filtered_data.value.push(this.original_data['value'][i]);
filtered_data.city.push(this.original_data['city'][i]);
}
}
if (filtered_data.value.length > 0) {
// Calculate the new y_range based on the filtered data for visible cities
const visible_data = filtered_data.value.filter((val, index) => selected_cities.includes(filtered_data.city[index]));
const min_value = Math.min(...visible_data) - 5;
const max_value = Math.max(...visible_data) + 5;
plot.y_range.start = min_value;
plot.y_range.end = max_value;
// Update the plot data source with the filtered data
source.data = filtered_data;
} else {
// Reset y_range to its original values if there is no visible data
const min_value = Math.min(...this.original_data.value);
const max_value = Math.max(...this.original_data.value);
plot.y_range.start = min_value;
plot.y_range.end = max_value;;
}
for (let i = 0; i < lines.length; i++) {
const city = lines[i].data_source.data['city'][0];
const visible = selected_cities.includes(city);
lines[i].visible = visible;
}
plot.change.emit();
"""))
show(column(multiselect, plot))
- A curious thing happens when I add only one line to the plot, as in the code below. Instead of the
for
loop, I added a single line to thelines
list, while everything else remained the same (including the CustomJS callback). When I select others cities (while still selecting City-4) theplot.y_range
is updated, but I cannot make it work when adding the other lines to the plot.
#%% Ploting only one line
multiselect = MultiSelect(title="Cities:", value=list(df['city'].unique()),
options=list(df['city'].unique()))
plot = figure(width=400, height=250, title='Total monthly value - by city',
x_axis_type='datetime')
plot.yaxis[0].formatter = NumeralTickFormatter(format="R$0,00")
colors = ['red', 'blue', 'green', 'purple', 'orange']
lines = []
lines.append(plot.line(x='date', y='value', color='blue', source=df[df['city'] == 'City-4'], legend_label='City-4', line_width=2))
plot.legend.location = 'top_left'
plot.legend.title = 'Cities'
source = ColumnDataSource(df)
# Update the plot's data source when the MultiSelect is interacted with
multiselect.js_on_change('value', CustomJS(args=dict(plot=plot, lines=lines, source=source, multiselect=multiselect), code="""
const selected_cities = multiselect.value;
const data = source.data;
// Check if the original_data attribute exists; if not, set it to the current data
if (!this.original_data) {
this.original_data = {
date: [...data['date']],
value: [...data['value']],
city: [...data['city']]
};
}
// Filter the DataFrame on the server side using the selected cities
const filtered_data = { date: [], value: [], city: [] };
for (let i = 0; i < this.original_data['city'].length; i++) {
if (selected_cities.includes(this.original_data['city'][i])) {
filtered_data.date.push(this.original_data['date'][i]);
filtered_data.value.push(this.original_data['value'][i]);
filtered_data.city.push(this.original_data['city'][i]);
}
}
if (filtered_data.value.length > 0) {
// Calculate the new y_range based on the filtered data for visible cities
const visible_data = filtered_data.value.filter((val, index) => selected_cities.includes(filtered_data.city[index]));
const min_value = Math.min(...visible_data) - 5;
const max_value = Math.max(...visible_data) + 5;
plot.y_range.start = min_value;
plot.y_range.end = max_value;
// Update the plot data source with the filtered data
source.data = filtered_data;
} else {
// Reset y_range to its original values if there is no visible data
const min_value = Math.min(...this.original_data.value);
const max_value = Math.max(...this.original_data.value);
plot.y_range.start = min_value;
plot.y_range.end = max_value;;
}
for (let i = 0; i < lines.length; i++) {
const city = lines[i].data_source.data['city'][0];
const visible = selected_cities.includes(city);
lines[i].visible = visible;
}
plot.change.emit();
"""))
show(column(multiselect, plot))
- The third attemptive was using the
figure.legend.click_policy
(as in the code below), to hide the lines. Although hide and show works just fine, I cannot think in a way to make it adjust y_range of the plot.
from bokeh.palettes import Spectral10
nplot = figure(width=400, height=250,title='Valor mensal total - por cidade',
x_axis_type="datetime")
nplot.yaxis[0].formatter = NumeralTickFormatter(format="$0.00", language="pt-br")
lines = []
for cidade, color in zip(df['cidade'].unique(), Spectral10):
df_city = df[df['cidade'] == cidade]
line = nplot.line(x='data', y='valor', color=color, source=df_city,
legend_label=cidade, line_width=2)
lines.append(line)
nplot.legend.location = "top_left"
nplot.legend.click_policy="hide"
show(nplot)
I’m running the code with Spyder (Anaconda3) on my local machine. From my research I’ve came to the conclusion that there isn’t a standard way to accomplish what I need to do. Does anyone knows anything about a workaround of doing it?
Thanks in advance.
Ps.: English is not my first language, so I’m sorry if anything could not be completely understanded.