On-the-fly math with ColumnDataSource columns when plotting?

elnjensen · July 5, 2024, 9:51pm

Is it possible to do on-the-fly math with ColumnDataSource columns? In, e.g., matplotlib, if I have numpy arrays x and y, I could do something like plt.plot(x, y+5).

I’m working in bokeh and I’d like to draw a quad whose x location is specified by a column ‘X’ in my ColumnDataSource. But if I try something like this:

    p = figure(title="My plot")

    # df is a pandas dataframe
    cds = ColumnDataSource(df)
    p.quad(source=cds, bottom='START_INDEX', top='END_INDEX', left='X'-0.5,
           right='X'+0.5, color='blue', alpha=0.01)

then I get the error

TypeError: unsupported operand type(s) for -: 'str' and 'float'

because it thinks I am trying to subtract 0.5 from the string ‘X’ rather than the column named ‘X’.

On the other hand, if I do this:

    p.quad(source=cds, bottom='START_INDEX', top='END_INDEX', left=cds.data['X']-0.5,
           right=cds.data['X']+0.5, color='blue', alpha=0.01)

I no longer get the TypeError, but instead I get a RuntimeError about mixing data sources:

RuntimeError: 

Expected left and right to reference fields in the supplied data source.

Of course I can add new columns to my ColumnDataSource that are X - 0.5 and X + 0.5, but that involves some duplication of data.

Any simple trick I’m missing here? In truth it’s not a big deal since my data source in this example isn’t huge, but I’m curious what the best idiom is in cases like this.

Thanks in advance for any thoughts!

Bryan · July 5, 2024, 10:13pm

It’s always important to bear in mind Bokeh is actually “(Python) Bokeh plus BokehJS”, and that almost all of the actual work is really done in JavaScript, in the browser, by BokehJS. So, that means that it’s often necessary to clarify the goals somewhat.

If by “on-the-fly math” you mean “on-the-fly math in Python” then there is no possible way to avoid “duplication” because ultimately all those values computed in Python “on-the-fly” will need to be serialized, i.e. copied, in order to be sent to BokehJS, which needs them concretely to actually do the drawing.
If instead, you don’t care where they “on-the-fly” happens, then yes you can define a CustomJSExpr, which encapsulates the computation itself in a serializable way, in order to compute the values dynamically in the browser, in JavaScript. There’s an example here:

customjs_expr — Bokeh 3.5.0 Documentation

There’s no “best idiom” here since there’s tradeoffs involved either way:

CustomJSExpr will reduce “duplication”, will automatically respond if the source inputs change, but…

the re-computation happens on every render, maybe expensive, depending
the computations have to be expressed in JavaScript code, so e.g. explicit loops are required since NumPy does not exist in JavaScript

You just have to choose what best suits your needs. I do think it’s fair to say that most users just put whatever data they need directly into the CDS, unless they really get some specific benefit out of CustomJSExpr, since its arguably “more work” to ship the computation of to BokehJS.

All that said, I am not sure putting the x + 0.5 array in the CDS actually results in any more duplication than the MPL case, really. The actual concrete array x + 0.5 is constructed in memory in both cases. In the Bokeh case, the CDS holds a reference to (not a copy of) that array. It’s possibly more a question of lifetime, since MPL uses and discards that array “immediately”, whereas Bokeh keeps it around in the CDS as long as the CDS is around.

The CustomJSExpr option is strictly an improvement over what is possible with MPL (in terms of total Python process memory usage) but also has its own tradeoffs as described above.

Bryan · July 5, 2024, 10:25pm

I guess I should also add, you can pass “dynamic” values in python, as long as you pass all the values that way, and don’t use an explicit CDS at all, e.g. you can do this:

# t, b, and x are all arrays, no source arg is passed
p.quad(bottom=b, top=t, right=x+0.5, left=x=0.5)

If you do this, Bokeh will actually just create a CDS behind the scenes for you with those arrays for the columns. (There is always a CDS—it is the unit of serialization for glyph data).

What you can’t do is “mix and match” like you had above:

p.quad(source=cds, color='blue', alpha=0.01

       # these refer CDS columns
       bottom='START_INDEX', top='END_INDEX', 

       # these are "bare arrays"
       left=cds.data['X']-0.5, right=cds.data['X']+0.5)

That used to be permitted a very long time ago, before it was intentionally disallowed. It was disallowed because users could create a CDS with an existing column like "top" but then pass in “bare arrays” to the glyph for the top coordinate. The only option in that case was to replace the "top" column in their original CDS to make things “work”. But modifying user data like that is always confusing and bad, so attempting to “mix and match” was explicitly made an error instead.

Lastly, right='X'+0.5 can’t made to work because Bokeh can’t change how Python treats literals.

elnjensen · July 5, 2024, 11:30pm

This is all very useful, thanks! I appreciate the quick help.

Regarding your second point about “mix and match” not being allowed, I should also note that the error message in that case is very informative. I didn’t paste in the whole thing, but it makes that idea pretty clear.

Thanks again!

system · October 3, 2024, 11:31pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.