Bokeh is too slow, please help

Scoodood · August 22, 2020, 6:53am

hi guys,

I am new to Bokeh. I have tried to run the following code on JupyterLab with Python 3.8 and Bokeh 2.1.1, but it’s extremely slow and totally un-usable.

import numpy as np
import matplotlib.pyplot as plt
from bokeh.plotting import figure, show
from bokeh.models import PrintfTickFormatter


p = figure(plot_width=1920,  plot_height=1080, output_backend="webgl", tools='xpan,box_zoom,zoom_in,zoom_out,undo,reset')
y = np.random.random(size=(12, 2000000))        
x = np.arange(y.shape[1])

for n, line in enumerate(y):
    p.line(x, line + n * 2, legend_label=str(n))

p.xaxis.formatter = PrintfTickFormatter(format="%d")
p.legend.location = "top_left"
p.legend.click_policy = "hide"
show(p)

My PC has i7-9700k, 48GB DDR4, 1TB NVME SSD, RTX2070 Super GPU + Win10 Pro.
I was told Bokeh was designed to crunch large data, I guess I must be doing it wrong. What else can I do to speed this up? Please advice.

Bryan · August 22, 2020, 7:20am

You are absolutely going to have to use a tool like Datashader in conjunction with Bokeh for anything of that magnitude. Bokeh is a browser plotting library. Above you are sending

12 * 2M * 2 * 4bytes/float ~ 200 MB

to the browser. All of that has to be serialized and deserialized. Then it all also gets mapped to screen coordinates. so you now have a half gigabyte one browser tab. ^[1] You might have better luck with a Bokeh server application, where the numpy arrays can be sent directly over a websocket in to typed arrays without any encoding cost, but it’s still probably going to be slow.

Datashader can render the data efficiently Python side using Numba and/or CuDF, and the resulting rendered image can be displayed in a Bokeh plot. This vastly compresses the communication and storage overhead to the browser.

If you need interactivity over that much data, then the very high level Holoviews library can automatically coordinate Datashader rendering in Bokeh plots in response to interactive panning, zooming, or other events. They have nice examples of this even up to billion point datasets.

And this number is actually anticipating the next release that switches Float64Array array default to Float32Array. Really, presently, you’d be looking at nearly a gigabyte per tab. ↩︎

Bryan · August 22, 2020, 7:34am

Here, specifically, is the chapter on HV+Datashader:

Working with large data using datashader — HoloViews 1.14.5 documentation

Scoodood · August 22, 2020, 7:40am

hi @Bryan, thanks for the pointer. Before experimenting on Bokeh I was trying with PySide2 + Pyqtgraph, and had experiment with downsampling technique to reduce the number of data point to be drawn on the screen when I zoom out. When I zoom in, it will recalculate the density and automatically fill in the detail so that the number of data point remain the same. Another experiment I did was convert the lines into QtCharts.QPath object. Both outcomes were pretty good, but the latter is much better. I believe Bokeh can do similar optimization there.

Bryan · August 22, 2020, 7:43am

@Scoodood You can certainly do that kind of viewport-based downsampling approach with plain Bokeh, including for interactive use, but you would have to use a live Bokeh server application for sure, to avoid sending the entire data set to the browser. This is effectively what Holoviews can do for you automatically with Datashader (otherwise with plain Bokeh you will be implementing the downsampling yourself).

Scoodood · August 22, 2020, 8:52pm

hi @Bryan, thanks again. I will explore Holoview as well.