CustomJSTransform NOT agnostic of CDS length?

This is kinda piggybacking/an extension of this question CustomJS for selected indicies after region selection in multi_line model , and would probably largely address/resolve my motive behind this feature request: [FEATURE] MultiScatter · Issue #12367 · bokeh/bokeh · GitHub

What I tried to do was come up with a way of plotting the MultiLine coordinates as individual points on a Scatter WITHOUT duplicating the data (i.e. without making two CDS’s, one containing the list of list/unflattened data, and another containing the same thing but flattened.

My idea was to use a CustomJSTransform that will flatten the “list of lists” :

from bokeh.plotting import figure, show, save
from bokeh.transform import transform
from bokeh.models import MultiLine, ColumnDataSource, CustomJS, CustomJSTransform, CustomJSExpr
from bokeh.core.properties import expr
import numpy as np


x = [[1,2,3,4,5], [1,2,3,4,5]]
y = [[8,6,5,2,3], [3,2,5,6,8]]

s1 = ColumnDataSource(data=dict(x0=x, y0=y))

f = figure(tools='lasso_select')

r=f.multi_line(xs='x0',ys='y0',source=s1)

tr = CustomJSTransform(v_func='''
                        return xs.flat()
                        ''')
rs = f.scatter(x=transform('x0',tr),y=transform('y0',tr)
                ,fill_color='red',source=s1)

show(f)

The result is that only the first two points of the Scatter are plotted → This seems to be because s1.data’s columns have a length of two… whereas the transformed/flattened data actually has a length of 10.

If I try a “variation” of this, where I create a separate CDS with identical column names to drive the flattened result, and pass the original “unflattened” CDS into the CustomJSTransform :

from bokeh.plotting import figure, show, save
from bokeh.transform import transform
from bokeh.models import MultiLine, ColumnDataSource, CustomJS, CustomJSTransform, CustomJSExpr
from bokeh.core.properties import expr
import numpy as np


x = [[1,2,3,4,5], [1,2,3,4,5]]
y = [[8,6,5,2,3], [3,2,5,6,8]]

s1 = ColumnDataSource(data=dict(x0=x, y0=y))
sf = ColumnDataSource(data=dict(x0=[],y0=[]))

f = figure(tools='lasso_select')

r=f.multi_line(xs='x0',ys='y0',source=s1)

xtr = CustomJSTransform(args=dict(s1=s1)
                       ,v_func='''
                        return s1.data['x0'].flat()
                        ''')
ytr = CustomJSTransform(args=dict(s1=s1)
                       ,v_func='''
                        return s1.data['y0'].flat()
                        ''')
rs = f.scatter(x=transform('x0',xtr),y=transform('y0',ytr)
                ,fill_color='red',source=sf)

show(f)

(Kinda saw this coming), I get no scatter points plotted, because really I’m just doing the same thing but with extra steps… except this time I’m relying on an empty/zero length CDS.

Finally, I found I can get the above setup to work but ONLY if I instantiate my “flat source” (sf) with data equal in length to the unflattened result. I can fill it initially with zeros for example:

l = len([item for sublist in x for item in sublist]) #gets the unflattened length
sf = ColumnDataSource(data=dict(x0=np.zeros(l),y0=np.zeros(l)))

But… in cases where I have tens of thousands of coordinates… I don’t want to write tens of thousands of 0s to the html output to provide a completely redundant initial state. Is there a key mechanic I’m missing or is this something I simply can’t do with the available built in tools right now?

I’ll also mention I found an example here → bokeh/customjs_expr.py at branch-2.4 · bokeh/bokeh · GitHub using CustomJSExpr that also seems like it might be “harnessable” to achieve this, with the significant difference from my above attempts being that it demonstrates how to build a custom DataModel as well… I’m wondering if a custom DataModel could be used to do what I’m after, and if so, some hints/guidance on that would be swell… Thanks!

Data size is determined only from data source column sizes. The effect is as you observed, that any manipulation of the data with transforms that change the size of the data will result in truncated plots. This is understandable given the overall design. Glyphs (or glyph renderers) are data source driven and so are all the related features like selections. Given that data sources can be used by multiple glyph renderers, we cannot allow glyphs to dictate data size.

The solution is to add support for derived data sources, where transforms would be applied in a data source and not a glyph. Glyph transforms would still be allowed with this, but only for non-size changing transforms. This was (give or take) proposed in [FEATURE] Derived data sources for selections and viewports · Issue #10155 · bokeh/bokeh · GitHub, though there aren’t many details in that issue. Having derived data sources would likely also solve the problem of glyphs driven exclusively by expressions (or data generators), which currently suffers from the same problem.

1 Like

Thanks so much for the explanation. I struggle sometimes to find the succinct language of what I’m asking about but you nailed it: I’m trying to make a glyph driven exclusively by an expression, derived from another CDS (i.e. a derived datasource).

I very much recognize that this is a complex nuance way out of my league and best left to the pros to ponder and (hopefully) implement. Thanks again.