I have created a simple test case below. Basically, I have a three column dataframe where the “DAY” and “VALUE” columns correspond to the x and y coordinates for a time series. The third column “REF_START_DAY” is a different location (referenced by the start day) on the same time series. Finally, the slider moves across the time series and displays 500 day segments along the time series starting with the slider.value (blue) along with its corresponding 500 day reference segment along the time series starting with “REF_START_DAY” (red).
#!/usr/bin/env python
import numpy as np
import pandas as pd
from bokeh.plotting import figure, curdoc
from bokeh.layouts import layout
from bokeh.models import ColumnDataSource, Slider
window = 500
n = 10000
df = pd.DataFrame()
df[‘DAY’] = range(n)
df[‘VALUE’] = np.random.uniform(-1000, 1000, [n])
df[‘REF_START_DAY’] = np.random.randint(0, n-window, n)
idx = df.iloc[[0], :].loc[:, ‘REF_START_DAY’].values[0]
cds = ColumnDataSource(dict(
x=df.iloc[0:window, :].loc[:, 'DAY'],
y1=df.iloc[0:window, :].loc[:, 'VALUE'],
y2=df.iloc[idx:idx+window, :].loc[:, 'VALUE'],
)
)
sizing_mode = ‘stretch_both’
p = figure(sizing_mode=sizing_mode)
line1 = p.line(x=‘x’, y=‘y1’, source=cds, color=‘blue’)
line2 = p.line(x=‘x’, y=‘y2’, source=cds, color=‘red’)
slider = Slider(start=0.0, end=10000-window, value=0, step=1, sizing_mode=sizing_mode)
def update(attr, old, new):
data = line1.data_source.data
data['y1'] = df.iloc[slider.value:slider.value+window, :].loc[:, 'VALUE'].tolist()
# The below two lines cause the delay between the slider and plot update
idx = df.iloc[[slider.value], :].loc[:, 'REF_START_DAY'].values[0]
data['y2'] = df.iloc[idx:idx+window, :].loc[:, 'VALUE'].tolist()
slider.on_change(‘value’, update)
l = layout(children=[[p],[slider]], sizing_mode=sizing_mode)
curdoc().add_root(l)
The plot updates quickly if I’m only updating the blue curve. However, if I update both the blue and the red at the same time then I see a significant delay. Essentiallly, I the both curves are just subsetting from the larger dataframe but the length of those data sources have a length of 500 (much shorter than the dataframe length). I want to get rid of as much of the delay as possible.