Optimize Slider Callback Update

I have created a simple test case below. Basically, I have a three column dataframe where the “DAY” and “VALUE” columns correspond to the x and y coordinates for a time series. The third column “REF_START_DAY” is a different location (referenced by the start day) on the same time series. Finally, the slider moves across the time series and displays 500 day segments along the time series starting with the slider.value (blue) along with its corresponding 500 day reference segment along the time series starting with “REF_START_DAY” (red).

#!/usr/bin/env python

import numpy as np

import pandas as pd

from bokeh.plotting import figure, curdoc

from bokeh.layouts import layout

from bokeh.models import ColumnDataSource, Slider

window = 500

n = 10000

df = pd.DataFrame()

df[‘DAY’] = range(n)

df[‘VALUE’] = np.random.uniform(-1000, 1000, [n])

df[‘REF_START_DAY’] = np.random.randint(0, n-window, n)

idx = df.iloc[[0], :].loc[:, ‘REF_START_DAY’].values[0]

cds = ColumnDataSource(dict(

    x=df.iloc[0:window, :].loc[:, 'DAY'],

    y1=df.iloc[0:window, :].loc[:, 'VALUE'],

    y2=df.iloc[idx:idx+window, :].loc[:, 'VALUE'],

)

)

sizing_mode = ‘stretch_both’

p = figure(sizing_mode=sizing_mode)

line1 = p.line(x=‘x’, y=‘y1’, source=cds, color=‘blue’)

line2 = p.line(x=‘x’, y=‘y2’, source=cds, color=‘red’)

slider = Slider(start=0.0, end=10000-window, value=0, step=1, sizing_mode=sizing_mode)

def update(attr, old, new):

data = line1.data_source.data

data['y1'] = df.iloc[slider.value:slider.value+window, :].loc[:, 'VALUE'].tolist()

# The below two lines cause the delay between the slider and plot update

idx = df.iloc[[slider.value], :].loc[:, 'REF_START_DAY'].values[0]

data['y2'] = df.iloc[idx:idx+window, :].loc[:, 'VALUE'].tolist()

slider.on_change(‘value’, update)

l = layout(children=[[p],[slider]], sizing_mode=sizing_mode)

curdoc().add_root(l)

The plot updates quickly if I’m only updating the blue curve. However, if I update both the blue and the red at the same time then I see a significant delay. Essentiallly, I the both curves are just subsetting from the larger dataframe but the length of those data sources have a length of 500 (much shorter than the dataframe length). I want to get rid of as much of the delay as possible.

I believe it is best practice update all the columns of a column data source simultaneously when updating more than one column. Below a trimmed down version of your call back that feels a bit more snappy:

def update(attr, old, new):
    y1_df = df.loc[slider.value:slider.value+window-1    , :]
idx = y1_df['REF_START_DAY'].values[0
    ]
y1 = y1_df['VALUE'    ]
y2 = df['VALUE'].loc[idx:idx+window-1
    ]
cds.data = {
'y1'        : y1,
'y2'        : y2,
'x': cds.data['x'    ]
}

I changed some of the steps in indexing df and changed the way cds is updated so that y1 and y2 are updated at the same time. It is important to update all fields when updating a column data source this way. Any fields that aren’t included will be discarded in the update. Also, updating an instance of a data source updates all associated glyphs as well — This doesn’t improve performance, but it has a cleaner look to it :wink:

···

On Thu, Mar 9, 2017 at 10:53 AM, [email protected] wrote:

I have created a simple test case below. Basically, I have a three column dataframe where the “DAY” and “VALUE” columns correspond to the x and y coordinates for a time series. The third column “REF_START_DAY” is a different location (referenced by the start day) on the same time series. Finally, the slider moves across the time series and displays 500 day segments along the time series starting with the slider.value (blue) along with its corresponding 500 day reference segment along the time series starting with “REF_START_DAY” (red).

#!/usr/bin/env python

import numpy as np

import pandas as pd

from bokeh.plotting import figure, curdoc

from bokeh.layouts import layout

from bokeh.models import ColumnDataSource, Slider

window = 500

n = 10000

df = pd.DataFrame()

df[‘DAY’] = range(n)

df[‘VALUE’] = np.random.uniform(-1000, 1000, [n])

df[‘REF_START_DAY’] = np.random.randint(0, n-window, n)

idx = df.iloc[[0], :].loc[:, ‘REF_START_DAY’].values[0]

cds = ColumnDataSource(dict(

    x=df.iloc[0:window, :].loc[:, 'DAY'],
    y1=df.iloc[0:window, :].loc[:, 'VALUE'],
    y2=df.iloc[idx:idx+window, :].loc[:, 'VALUE'],
)

)

sizing_mode = ‘stretch_both’

p = figure(sizing_mode=sizing_mode)

line1 = p.line(x=‘x’, y=‘y1’, source=cds, color=‘blue’)

line2 = p.line(x=‘x’, y=‘y2’, source=cds, color=‘red’)

slider = Slider(start=0.0, end=10000-window, value=0, step=1, sizing_mode=sizing_mode)

def update(attr, old, new):

data = line1.data_source.data
data['y1'] = df.iloc[slider.value:slider.value+window, :].loc[:, 'VALUE'].tolist()
# The below two lines cause the delay between the slider and plot update
idx = df.iloc[[slider.value], :].loc[:, 'REF_START_DAY'].values[0]
data['y2'] = df.iloc[idx:idx+window, :].loc[:, 'VALUE'].tolist()

slider.on_change(‘value’, update)

l = layout(children=[[p],[slider]], sizing_mode=sizing_mode)

curdoc().add_root(l)

The plot updates quickly if I’m only updating the blue curve. However, if I update both the blue and the red at the same time then I see a significant delay. Essentiallly, I the both curves are just subsetting from the larger dataframe but the length of those data sources have a length of 500 (much shorter than the dataframe length). I want to get rid of as much of the delay as possible.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/6a00b3f4-9578-4ba8-81a8-d446eb3d71ee%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Thanks, that did the trick!

···

On Friday, March 10, 2017 at 6:56:38 PM UTC-5, Tyler Nickerson wrote:

I believe it is best practice update all the columns of a column data source simultaneously when updating more than one column. Below a trimmed down version of your call back that feels a bit more snappy:

def update(attr, old, new):
    y1_df = df.loc[slider.value:slider.value+window-1    , :]
idx = y1_df['REF_START_DAY'].values[0
    ]
y1 = y1_df['VALUE'    ]
y2 = df['VALUE'].loc[idx:idx+window-1
    ]
cds.data = {
'y1'        : y1,
'y2'        : y2,
'x': cds.data['x'    ]
}

I changed some of the steps in indexing df and changed the way cds is updated so that y1 and y2 are updated at the same time. It is important to update all fields when updating a column data source this way. Any fields that aren’t included will be discarded in the update. Also, updating an instance of a data source updates all associated glyphs as well — This doesn’t improve performance, but it has a cleaner look to it :wink:

On Thu, Mar 9, 2017 at 10:53 AM, [email protected] wrote:

I have created a simple test case below. Basically, I have a three column dataframe where the “DAY” and “VALUE” columns correspond to the x and y coordinates for a time series. The third column “REF_START_DAY” is a different location (referenced by the start day) on the same time series. Finally, the slider moves across the time series and displays 500 day segments along the time series starting with the slider.value (blue) along with its corresponding 500 day reference segment along the time series starting with “REF_START_DAY” (red).

#!/usr/bin/env python

import numpy as np

import pandas as pd

from bokeh.plotting import figure, curdoc

from bokeh.layouts import layout

from bokeh.models import ColumnDataSource, Slider

window = 500

n = 10000

df = pd.DataFrame()

df[‘DAY’] = range(n)

df[‘VALUE’] = np.random.uniform(-1000, 1000, [n])

df[‘REF_START_DAY’] = np.random.randint(0, n-window, n)

idx = df.iloc[[0], :].loc[:, ‘REF_START_DAY’].values[0]

cds = ColumnDataSource(dict(

    x=df.iloc[0:window, :].loc[:, 'DAY'],
    y1=df.iloc[0:window, :].loc[:, 'VALUE'],
    y2=df.iloc[idx:idx+window, :].loc[:, 'VALUE'],
)

)

sizing_mode = ‘stretch_both’

p = figure(sizing_mode=sizing_mode)

line1 = p.line(x=‘x’, y=‘y1’, source=cds, color=‘blue’)

line2 = p.line(x=‘x’, y=‘y2’, source=cds, color=‘red’)

slider = Slider(start=0.0, end=10000-window, value=0, step=1, sizing_mode=sizing_mode)

def update(attr, old, new):

data = line1.data_source.data
data['y1'] = df.iloc[slider.value:slider.value+window, :].loc[:, 'VALUE'].tolist()
# The below two lines cause the delay between the slider and plot update
idx = df.iloc[[slider.value], :].loc[:, 'REF_START_DAY'].values[0]
data['y2'] = df.iloc[idx:idx+window, :].loc[:, 'VALUE'].tolist()

slider.on_change(‘value’, update)

l = layout(children=[[p],[slider]], sizing_mode=sizing_mode)

curdoc().add_root(l)

The plot updates quickly if I’m only updating the blue curve. However, if I update both the blue and the red at the same time then I see a significant delay. Essentiallly, I the both curves are just subsetting from the larger dataframe but the length of those data sources have a length of 500 (much shorter than the dataframe length). I want to get rid of as much of the delay as possible.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/6a00b3f4-9578-4ba8-81a8-d446eb3d71ee%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.