Access synthetic coordinate mapping to calculate "in-between" points

Hi everyone!

I’ve been using Bokeh for a while now and I’ve been very happy with it’s customizability and speed. A new requirement has come up in quite a complex existing project and I would be very happy to be able to implement the changes without a complete overhaul of the data and structure.

For context, I am using bokeh to visualize the movement of trains in a railway network. The x-axis contains two-level categorical data with the stations being the first level and the platforms being the second level. The y-axis is a datetime axis. Now, I want to model delays and stoppages at specific points in between two stations. So an example of what I want to do would look something like this:

Now, I’m using a FactorRange to define my x_range. If I understand it correctly, the FactorRange is converted to a numerical synthetic coordinate range under the hood. However, the mapping is managed on the JS side, correct?

I need to access specific points in between two of my x-coordinates. For example, I would like to get the x-coordinates of a point that is 40% of the way between (Station 1, Platform 1) and (Station 2, Platform 1). I’ve found a very hacky way to do this using offsets that makes a lot of assumptions about the mapping of multi-level-categories and the padding of my FactorRange. This seems to be very error-prone and tends to fall apart if the two stations are not direct neighbours on the x-axis. Is there any way I can achieve this in a more robust manner, for example by somehow accessing the synthetic coordinate mapping on the Python side? I’ve attached a minimal code example with some comments about what I would like to do.

Thanks in advance for your help!

EDIT: I just realized something I should add: In the project I am talking about I’m running a bokeh server by manually creating and starting a Tornado IOLoop from the code that receives updates in real-time. So in theory some type of bidirectional communication between the Python and JS sides should be possible.

EDIT2: I’ve been digging through the JS source for a bit and it turns out that the FactorMapper’s mapping attribute seems to be exactly what I’m looking for. There is no simple way to access it from the Python side of things without changing both the JS and Python source, is there? :sweat_smile:

from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, FactorRange
from bokeh.models.formatters import DatetimeTickFormatter

x_range = [("Station 1", "P1"), ("Station 1", "P2"), ("Station 2", "P1"), ("Station 2", "P2")]
x_values = [("Station 1", "P1"), ("Station 2", "P1")]
y_values = [30000000, 30300000]
cds_dict = {"x": x_values,
            "y": y_values}

#Here I would like to do something like the following
#x1_synth_coords = x_synth_mapping[("Station 1", "P1")]
#x2_synth_coords = x_synth_mapping[("Station 2", "P1")]
#x_between = interpolate(x1_synth_coords, x2_synth_coords, 0.4)
#x_values = [("Station 1", "P1"), x_between, x_between, ("Station 2", "P1")
#y_values = [30000000, 30200000, 30300000, 30400000]

source = ColumnDataSource(data=cds_dict)

p=figure(x_range=FactorRange(*x_range, group_padding=5), y_axis_type="datetime", width=400, height=300)
p.y_range.flipped = True
p.yaxis.formatter = DatetimeTickFormatter(minutes='%H:%M')

p.line(x="x", y="y", source=source)
show(p)

Yes everything is handled on the JS side, and nothing is exposed to Python at all. The process for computing the synthetic coordinates is somewhat complicated, but in case you wanted to dig into it for any insights, the code for mapping two-level factors is here:

bokeh/bokehjs/src/lib/models/ranges/factor_range.ts at 81be65bfddc43ba57730a2014db9f71a4a75a736 · bokeh/bokeh · GitHub

Regarding a solutions with offsets, it’s hard to speculate where you might be running into issues without actual code to look at. One suggestion that comes to mind is to set the group_padding on the factor range to zero (as well as any other padding properties). The idea is that without any padding added, the width of every bottom-level factor should be exactly 1.0 which may make your own computations for what offsets to use more robust.

More broadly… “lines” don’t really have meaning on a categorical scale. Since categories are discrete there is nothing “in between” them, so a notion of a continuous line between categories does not actually make sense. If I were looking at something like this, I might consider large scatter points for the times at each station, and then model “breaks” on the y-scale with box annotations or rects that span the width of the plot.

Hi Bryan,

Thanks for your response! Yeah, I stumbled upon that computation while digging through the JS source and my current idea is to replicate the mapping method on the Python side to have a copy of the mapping, so to speak. It would violate god knows how many software development principles but for now it’s probably the simplest way to achieve what I’m after :sweat_smile:

I kinda want to keep the group_padding for visual reasons but I I might have gained some better understanding on how to calculate the offsets from looking at the previous code snippet as well.

I agree that modelling the stations as categories connected by lines doesn’t really make sense at all. But you know how it is, start with a prototype where the automatic generation and nesting of the x-range is convenient and everything grows from there until it’s too late :sweat_smile:. If I got to do it all over again I would probably use a distance metric for a continuous scale and place ticks or annotations where the stations are supposed to be.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.