Work with multiple scatter plot selection callback

guillaume.androz · September 5, 2019, 1:33pm

I have a graph with multiple scatter plot from multiple ColumnDataSource. I would like to add a selection callback on it but I have several issues.

First, when I use the box selection tool, if the selected area does not include one point of a data source, all the points of this data source remain “active” i.e. visually there is not alpha applied on them.
Secondly, the callback object returns an indices array, but to what do those indices refer ? I would have extected an array of arrays, one array per datasource/plot.

Basically, I would like to reproduce the example here but with several data sources

The workaround I found is to use one callback per data source, but it not very pleasant to use. Furthermore, it does not solve the first issue

carolyn · September 5, 2019, 3:50pm

You may be able to solve the first issue by using a single ColumnDataSource with multiple CDSViews. I think the fundamental issue is that the selection tool “belongs to” and acts upon one ColumnDataSource, so a solution is probably going to involve combining your data sources into one.

I was able to get something working with the following. I’m not saying it’s the most beautiful solution, but it may give you some ideas to start:

from random import random

from bokeh.layouts import row
from bokeh.models import CustomJS, ColumnDataSource, CDSView, BooleanFilter
from bokeh.plotting import figure, show

blank = [None] * 50
x1 = [random() for x in range(50)] + blank
y1 = [random() for y in range(50)] + blank
x2 = blank + [random() for x in range(50)]
y2 = blank + [random() for y in range(50)]

s1 = ColumnDataSource(data=dict(x1=x1, y1=y1, x2=x2, y2=y2))
boolean_1 = [True if x_val is not None else False for x_val in s1.data['x1']]
boolean_2 = [True if x_val is not None else False for x_val in s1.data['x2']]
view_1 = CDSView(source=s1, filters=[BooleanFilter(boolean_1)])
view_2 = CDSView(source=s1, filters=[BooleanFilter(boolean_2)])
p1 = figure(plot_width=400, plot_height=400, tools="lasso_select")
p1.circle('x1', 'y1', source=s1, view=view_1, color="red", alpha=0.6)
p1.circle('x2', 'y2', source=s1, view=view_2, color="green", alpha=0.6)

s1.selected.js_on_change('indices', CustomJS(args=dict(s1=s1), code="""
        // do whatever
    """)
)

show(p1)

Bryan · September 5, 2019, 3:59pm

the callback object returns an indices array, but to what do those indices refer

Hi @guillaume.androz I am not sure what indices array you are referring to. The callback is supplied with two arguments:

the model itself, in this case the BoxSelectionTool
a model-specific data payload object, in this case {g: geometry} that supplies the spatial coordinates of the box.

There are no selection indices passed to HoverTool.callback, as can be verified in the source code (In general, actual code and sample output is strongly recommended to focus the discussion, at this point I can only speculate what you are referring to.)

I would strongly advise against using the .callback mechanism in any case. They were replaced by the general .js_on_change facility, that can be use to uniformly trigger callbacks on any Bokeh property change. These ad-hoc .callback properties date back to the very early days of the project, and are sprinkled inconsistently around. They have been deprecated for some time, and will be removed in future (not too distant) release.

If you want to respond to changes in scatter selection indices, you should do:

source.selected.js_on_change('indices', customjs_callback)

for every data source that you care about. (Or if you do want the actual box geometry, you can use .js_on_event with BoxSelection event type)

if the selected area does not include one point of a data source, all the points of this data source remain “active” i.e. visually there is not alpha applied on them.

The selection/non_selection visual properties only apply when there is some non-empty selection to begin with. If you make an empty selection that is the same as clearing the selection, in which case the normal glyph is used. I.e. when a plot is first displayed, before any other action, it has an empty selection. Making a new empty selection some time later returns it to that same state from the beginning. I am not sure what else could be done that would not be inconsistent/confusing.

If you really have to have a different behaviour, you could create a custom extension that meets your specialized need.

guillaume.androz · September 5, 2019, 4:55pm

Thanks for your reply, I wanted to use one data source, but my sources do not have the same dimension (like 1000 points in the first and 1500 in the other)

guillaume.androz · September 5, 2019, 5:00pm

Thanks Bryan, I was indeed confused ! I soon after realized that in the
source.selected.js_on_change('indices', customjs_callback), the indices were refering to the indices of the source model…

carolyn · September 5, 2019, 5:14pm

Sure. They can be different lengths; you would just need to create matching blank columns of the same length. Below I’m arbitrarily choosing 45 and 57, but they can be anything:

x1 = [random() for x in range(45)]
y1 = [random() for y in range(45)]

x2 = [random() for x in range(57)]
y2 = [random() for y in range(57)]

blank_1 = [None] * len(x1)
blank_2 = [None] * len(x2)

s1 = ColumnDataSource(data=dict(x1=x1 + blank_2, y1=y1 + blank_2, x2=blank_1 + x2, y2=blank_1 + y2))

The end result would just need to be a dataframe with the following format:

x1	y1	x2	y2
a	b	_	_
c	d	_	_
e	f	_	_
_	_	g	h
_	_	i	j
_	_	k	l
_	_	m	n

So it doesn’t matter how many rows are in each subset of data; just that the columns that don’t apply are set to None, which is what I’m using those blank_1 and blank_2 arrays for.