Python callback triggered by SelectionGeometry event is slow compared to js equivalent

Dear bokeh community,

I am trying to use python callback triggered by SelectionGeometry event on image_rgba but it is very slow. Tried the same using js callback and it works nicely.

This behaviour is present both for lasso_select and box_select.
Tried on safari and chrome, using bokeh version 2.3.1

Code example

from bokeh.models import CustomJS
from bokeh.plotting import figure, output_notebook, show
from bokeh.events import SelectionGeometry
from bokeh.application import Application
from bokeh.application.handlers import FunctionHandler
import numpy as np

output_notebook()

# Choose 'python' or 'js'
case = 'python'

def f(doc):

    N = 1000
    img = np.empty((N,N), dtype=np.uint32)
    view = img.view(dtype=np.uint8).reshape((N, N, 4))
    for i in range(N):
        for j in range(N):
            view[i, j, 0] = int(i/N*255)
            view[i, j, 1] = 158
            view[i, j, 2] = int(j/N*255)
            view[i, j, 3] = 255
    plot = figure(tools="lasso_select, box_select, wheel_zoom")
    plot.image_rgba(image=[img], x=0, y=0, dw=10, dh=10)

    if case == 'python':
        plot.on_event(SelectionGeometry, lambda x: None)
        
    else:
        plot.js_on_event(SelectionGeometry, CustomJS(code=""" """))

    doc.add_root(plot)
    
    return doc
    
handler = FunctionHandler(f)
app = Application(handler)

show(app)

Here is the screenshot of desired result.

Am I doing something wrong ?

Thank you,
Ben

@benjamin I don’t really notice any slowdown with the box tool, at least on my system. But regardless there is definitely more fixed overhead in the on_change case vs the js_on_change case. In the latter a callback function in the same process is immediately triggered with the data, as-is. In the former case the data has to be serialized, sent across a websocket, deserialized by another process, and then python code is executed. All that said, it’s still slower than it should be.

I am afraid I don’t have a pat anwer. Some quick investigation:

  • The code above is slow both in a notebook or a standard bokeh serve app script, so the notebook is not relevant

  • The events_app.py example is not slow in the same way for me. Notably, it uses a scatter rather than an image

  • Slowness is also present on 2.4.0

  • Things seem much faster with 2.2.3

So, I would say this seems like a regression of some sort, and it seems especially evident with images (rather than scatters). There were some regressions that were known and fixed for 2.3.1 but apparently not all. I would advise to file a GitHub Issue with the test script (please remove the notebook parts, unless the notebook is relevant, things are much easier for us without that complication).

In the mean time all I can advise is to use 2.2.x for now.

cc @mateusz

@mateusz FYI a very quick look doesn’t seem to show excessive network traffic (i.e. the other known issue that was fixed) but rather that now the CPU utilization jumps through the roof during the selection.

This is a serialization problem, where the model referenced in selectiongeometry event is fully serialized. This is made worse due to ndarrays being serialized to inefficient base64 encoding, worsening already bad situation. I will work around this, until a generic differential serialization scheme is implemented.

@benjamin, please report an issue.

@mateusz Are you certain? When I looks at the websocket traffic with 2.3.1, the events look like this:

{“events”:[{“kind”:“MessageSent”,“msg_type”:“bokeh_event”,“msg_data”:{“event_name”:“selectiongeometry”,“event_values”:{“model”:{“id”:“1002”},“geometry”:{“type”:“poly”,“sx”:[375,371,356,356],“sy”:[163,174,221,221],“x”:[6.556985294117647,6.47610294117647,6.172794117647059,6.172794117647059],“y”:[7.466841186736475,7.2556719022687615,6.353403141361257,6.353403141361257]},“final”:false}}}],“references”:} 1618425039.4071987

There are no messages that come by containing the image data or any other references. Or do you mean that the serialization is occurring unnecessarily, and then being discarded?

Yes, the definition of the model is discarded, so the actual payload is small (this was fixed in 2.3.1), and eventually only the reference is kept.

1 Like

Thank you for your help. I can confirm that it works with 2.2.3.

Before I report an issue, just a small remark.
If I try with standalone script, it works smoothly even with 2.3.1.

import bokeh
from bokeh.models import CustomJS
from bokeh.plotting import figure, show
from bokeh.events import SelectionGeometry

import numpy as np

print(bokeh.__version__)
# Gives 2.3.1

# Choose 'python' or 'js'
case = 'python'

N = 1000
img = np.empty((N,N), dtype=np.uint32)
view = img.view(dtype=np.uint8).reshape((N, N, 4))
for i in range(N):
    for j in range(N):
        view[i, j, 0] = int(i/N*255)
        view[i, j, 1] = 158
        view[i, j, 2] = int(j/N*255)
        view[i, j, 3] = 255
plot = figure(tools="lasso_select, box_select, wheel_zoom")
plot.image_rgba(image=[img], x=0, y=0, dw=10, dh=10)

if case == 'python':
    plot.on_event(SelectionGeometry, lambda x: None)
    
else:
    plot.js_on_event(SelectionGeometry, CustomJS(code=""" """))

show(plot)

Is this expected behaviour?

Standalone bokeh scripts disallow python callbacks, and bokeh will print a warning to this effect although your script will still run (without any of the callback logic).

This seems entirely consistent with your observation that the python callback is a primary source of slowness.

1 Like