Puzzled about BoxSelectTool

Kwyjibeuss · November 14, 2020, 4:46pm

Hello,

I have hard times to understand the functionning of the BoxSelectTool. Indeed, I don’t understand why, in the two plots, the same area is not highlighted… See picture.

Cheers,

Sébastien.

swamilikes2code · November 14, 2020, 11:16pm

It depends on how your data is formatted in columndatasource. It highlights sections which are common to both. In your case, I am assuming you are plotting cos(x) and sin(x). When one is close to 0, then other is close to 1. It is performing exactly as expected.

How is this useful? Assume you have 2x2 grid of city data. The 4 plots have population, density, elevation, and map with all the cities. You can use boxselect or lasso select to focus on cities of interest. You can check out an example here: Python_based_Interactive_Scientific_Visualization/getting_started_bokeh_pandas at master · swamilikes2code/Python_based_Interactive_Scientific_Visualization · GitHub

Bryan · November 15, 2020, 10:54pm

Just to elaborate in one way: The users draws a rect region to select from, but the actual information that a selection represents are the indices into the data array. And when you share a selection across plots, that means: highlight the data points with the same indices (which may be a completely different region).

Another way to think of why it is this way, instead of sharing the “selection region coordinates”: There is no reason that a second plot has to have axis range that overlaps at all. In general it cannot be assumed that the actual rect coordinate are anywhere visible on another plot.

Kwyjibeuss · November 16, 2020, 10:25pm

Hey,

Thanks a lot. I am still rather new to coding and data visualization. I guess that, at least for me, the rectangle selection was misleading, since one actually the x-values.

Unfortunately, I could not open your example. But it seems interesting,

Cheers,

Sébastien.

Kwyjibeuss · November 16, 2020, 10:32pm

Hi Brian,

Thanks a lot. I am still rather new to coding and data visualization.
I was assuming that the rect selection consisting in selecting in the first plot a rectangle of 2D coordinates (like… a rectangle! ): [[x1 y1],[x2, y2]]. And would highlight the area of this coordinates in the second plot, adapting the scale. But in fact, one select a [x1, x2] interval in one plot and the corresponding [y1, y2] are highlighted in the two plots.
Is that correct?

Cheers,

Sébastien.

Bryan · November 16, 2020, 10:56pm

But in fact, one select a [x1, x2] interval in one plot and the corresponding [y1, y2] are highlighted in the two plots.

@Kwyjibeuss No that is definitely not correct. It’s not about the x and y ranges at all. The selection box ultimately generates array indices and any plots that share the selection highlight all the points corresponding to those same array indices.

As a concrete example: suppose the plots both show 10 circles, and use the box select tool to select the point on the first plot with index 6. Then the point on the second plot with array index 6 will be highlighted wherever it might be.

The point of selections behaving this way is that it lets you see and compare corresponding points across different data columns. The fact that they are in different locations is usually exactly what is desirable to see.

It also makes it possible to compare plots that have wildly different axes. Imagine a bunch of loan data plotted by delinquency and amount on one plot, and plotted geographically on another plot. You could select all the most delinquent loans on the first plot and then see maybe all the most delinquent loans tend to come from one geographic area. In this case it’s not even sensical to talk about using x-y range form the first plot on the second, they are not at all commensurable. The thing that is common between the plots, that can be used to compare, is the row indices of the data points.

Kwyjibeuss · November 17, 2020, 7:27pm

Thanks! I think I understand what you mean. I guess in my example, the x values happen to corresponds to indices but that is not correct to say that we selected an x-interval.

Let me check if I understood correctly. Let’s say we have to plots with 10 circles each, with the following coordinates :

x = [1,2,3,4,5,6,7,8,9,10]

y1 = [1,2,3,4,5,6,7,8,9,10]

y2 = [10,9,8,7,6,5,4,3,2,1]

Plot 1 have x/y1 circles and plot 2has the x/y2 ones.

So, if I select the circle with index 6 in plot 1, I will highlight the [7,7] (plot 1) and [7,6] (plot 2).

Cheers,

Seb.

_jm · November 17, 2020, 8:18pm

In your numerical example, selecting a circle with index 6 will correspond to the point (7,7) in the first plot as you surmise.

However, the circle at (7,4) will be selected in the second plot. y2[6] = 4 in your data.

Kwyjibeuss · November 17, 2020, 8:37pm

Yes, you’re right. It was a typo, I typed 6 as my mind was focused in « index 6 »

Thanks