Source data usage

grzegorz.malinowski · June 4, 2020, 9:45am

I’m wondering what would be the correct way to assign data that is divided/multiplied by constant or summed up or averaged (examples: np.sum(data) , np.cumsum(data/np.sum(data)) to y not using ColumnDataSource. Apperciate any suggestion.

data = np.array([3, 2, 5, 6, 2])

s_data = {'category': category,
          'y_data': data,
          'proportion': data/np.sum(data),
          'cum_proportion': np.cumsum(data/np.sum(data))}

source = ColumnDataSource(data=s_data)

plot_S.line(x='category', y='proportion', source=source)
plot_S.line(x='category', y='cum_proportion', source=source)

p-himik · June 4, 2020, 10:30am

What do you mean by “not using ColumnDataSource”? And why do you want to avoid using it?

grzegorz.malinowski · June 4, 2020, 10:37am

This is mainly because data is already defined in CDS: ‘y_data’: data
So the rest is just simply math operations utilizing this variable. I was thinking about sth like this:

plot_S.line(x=‘category’, y=‘y_data’.cumsum(), source=source)

p-himik · June 4, 2020, 11:10am

There are some built-in facilities for that: bokeh.transform — Bokeh 2.4.2 Documentation
As you can see, cumsum already exists there. If you find something missing, it’s usually pretty simple to implement using CustomJSTransform.

grzegorz.malinowski · June 4, 2020, 11:25am

Can you please list all built-in math functions (or sent a link)? Cumcum is only one o such funcs. Same to CustomJSTransform. I’d like to read more about it.

p-himik · June 4, 2020, 11:29am

The link is right there in my previous message. If you want additional functions to be built into Bokeh, please create a feature request on GitHub.

The documentation website has a search capability. If you follow the link above, the search field will be in the top left corner.

grzegorz.malinowski · June 8, 2020, 6:58pm

CustomJSTransform is challenging for me. Can you please give us, as the reference point, the piece of code how to compute division by scalar (say, 5), as an example?

_jm · June 8, 2020, 8:47pm

@grzegorz.malinowski

Here’s a basic example where the y data are a sine wave, and the plotted signal is this signal with 1/5 amplitude by using the CustomJSTransform.

The vectorized function argument v_func means that the transform is applied to a vector of values (in this case a sampled sine-wave) and returns a signal of equal length transformed through that function v_func.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
"""
import numpy as np

from bokeh.plotting import figure, show

from bokeh.transform import transform
from bokeh.models.transforms import CustomJSTransform

v_func = '''
    var rv = new Float64Array(xs.length)
    for(let i = 0; i < xs.length; i++) {
        rv[i] = xs[i] / 5.0
    }
    return rv
'''

x = np.linspace(0.0,1.0,101)
y = np.sin(2.0*np.pi*x)

data = dict(x=x, y=y)

p = figure(width=500, height=500)
ry = p.line(x='x',y=transform('y', CustomJSTransform(v_func=v_func)), source=ColumnDataSource(data=data))

show(p)

grzegorz.malinowski · June 8, 2020, 10:01pm

Many thanks @_jm. This is very useful.

grzegorz.malinowski · June 16, 2020, 8:25am

Going further. Can anyone look at my CustomJSTransform code for the mean and advise how to initiate Span having empty CDS?

from bokeh.io import curdoc
from bokeh.models import ColumnDataSource
from bokeh.models import Span
from bokeh.plotting import figure
from bokeh.transform import transform
from bokeh.models.transforms import CustomJSTransform

#--> mean
average_jst = '''
var total = 0;
for(var i = 0; i < CDS_lambda.data['y_data'].length; i++) {
    total += CDS_lambda.data['y_data'][i];
}
var avg = total / CDS_lambda.data['y_data'].length;
return avg;
'''

data_lambda = {'category': [],
               'y_data': []}
CDS_lambda = ColumnDataSource(data=data_lambda)

plot_1 = figure()
plot_1.add_layout(Span(location=transform('y_data', CustomJSTransform(v_func=average_jst)), source=CDS_lambda), dimension='width')

p-himik · June 16, 2020, 9:48am

Just return something like 'N/A' (for “Not Available”) when the length of the data is 0. Because you can’t compute mean of an empty collection.