Plot events by category vs time, with multiple events in one category

Consider a dummy data set of events.

df = pd.DataFrame(columns=['category', 'start','end','value','type'], data=[['A','2024-06-01','2024-06-10',0.5,'normal'],['B','2024-05-27','2024-06-16',.8,'normal'],['C','2024-06-04','2024-06-12',.3,'unusual']])

Each row defines an event with a set duration between the start and end dates, as well as a value and an event type.

The chart would show all of these events as rectangles. Time would be on the horizontal axis. The categories (A,B,C) would be on the vertical axis. The start and end dates would establish the start and end of the rectangles to show the duration of the events, which could be coloured based on their type. The height of the rectangles would be established by the ‘value’.

I have achieved this with bokeh and it works if there is only one event for each category. See my code:

import pandas as pd
import numpy as np
import seaborn as sns

from bokeh.plotting import figure, show, output_notebook, output_file
from bokeh.models import ColumnDataSource, Range1d
from bokeh.models.tools import HoverTool
output_notebook()

df = pd.DataFrame(columns=['category', 'start','end','value','type'], data=[['A','2024-06-01','2024-06-10',0.5,'normal'],['B','2024-05-27','2024-06-16',.8,'normal'],['C','2024-06-04','2024-06-12',.3,'unusual']])

c = {'normal':'green', 'unusual':'red'}
df['bottom'] = df.index + 0.5 - df['value']/2
df['top'] = df['bottom'] + df['value']
df[['start', 'end']] = df[['start', 'end']].apply(pd.to_datetime)
df['color'] = df['type'].map(c)

G = figure(title='Events', x_axis_type='datetime', width=800, height=400, y_range=df.category,
           x_range=Range1d(df.start.min(), df.end.max()), tools='save')

hover = HoverTool(tooltips="Category: @category<br>\
Start: @start<br>\
End: @end")
G.add_tools(hover)

CDS = ColumnDataSource(df)
G.quad(left='start', right='end', bottom='bottom', top='top', source=CDS, color="color")

show(G)

But I have more than one event for a given category, the code breaks.

df = pd.DataFrame(columns=['category', 'start','end','value','type'], data=[['A','2024-06-01','2024-06-10',0.5,'normal'],['B','2024-05-27','2024-06-16',.8,'normal'],['C','2024-06-04','2024-06-12',.3,'unusual'],['C','2024-05-06','2024-05-20',.8,'normal']])

This is the error

ERROR:bokeh.core.validation.check:E-1019 (DUPLICATE_FACTORS): FactorRange must specify a unique list of categorical factors for an axis: duplicate factors found: 'C'

Categories with multiple events would just show multiple rectangles at the same level on the vertical axis. How can I achieve this?

You need to remove duplicates from the list you pass to y_range. The y_range value configures the axis so it can only contain one of each category, in the order you want them to be displayed on the axis.

Thank you. Hmm… I understand what you are saying. How could I then have more than one event for a given category? Is that achievable with Bokeh?

You are conflating two things that are completely separate and distinct:

  • configuring what shows up on an axis
  • plotting what is in a data source

I hope it makes sense that in a plot where the categories are “months of the year” that each month should only appear once on the axis, regardless of how many points there are for a given month. Or conversely, if there are 387 points for “June” then you don’t want the axis to display the text “June” 387 times.

Here is an example with thousand of points, split across “days of the week”. There are only seven categories on the axis (one for each day of the week) but all the points are plotted (because they are in the data source):

https://docs.bokeh.org/en/latest/docs/user_guide/topics/categorical.html#categorical-scatter-plots-with-jitter

I would encourage you to read and understand that entire chapter on categorical plotting in Bokeh.

Hi @Bryan. I’ve understood. It makes sense and it works. Thank you for your help.