Explanation on Out-of-Memory Error when drawing large Images

kaprisonne · August 2, 2021, 2:25pm

Hi,

from my knowledge bokeh draws an Image-Glyph pixel-by-pixel onto the HTML-Canvas element (correct me if I am wrong).
I have done some testing and can see that drawing a big image (e.g., 2000x2000 pixels) directly, using JS, consumes less memory than using bokeh to draw an image-glyph of the same size.
Both, at some point will cause an Out-of-Memory error though.

Where does that Bokeh-Overhead come from? Why does it even take to much RAM to draw these pixels? Are they all hold in memory?

Futhermore: Why does it take long to initally draw such big images but further adjustment to the Bokeh-glyph happen almost instantenously? For example changing the palette will take almost no time to update the colors of all pixels.

Thanks!

Bryan · August 2, 2021, 4:15pm

Hi @kaprisonne Any performance-related concern should always begin with concrete, complete Minimal Reproducible Example that can be profiled and traced directly, so that there is no confusion or speculation about what to measure, how to run, what exactly is being seen. Without specifics it is impossible to speculate whether there is a usage problem, a bug, a place to look for improvement, or just some intrinsic limitation.

kaprisonne · August 2, 2021, 6:33pm

Hi,

interestingly enough I now get this error when trying to draw a 5000x5000 image (I used to get a sad, dead face telling me I was out of memory):
[bokeh] Failed to load Bokeh session WRqQAJrkmquYCdk2HH99oOvN20v6AJv9CdzfULi9EXU6: RangeError: Invalid array length

This is the bokeh code:

Code

import numpy as np

from bokeh.io import curdoc
from bokeh.plotting import figure
from bokeh.server.server import Server
from bokeh.settings import settings
settings.minified = False

def create_plot():
    fig = figure(width=5000, height=5000)
    fig.image(dw=5000, dh=5000, x=0, y=0, image=[np.random.rand(5000, 5000)])

    return fig


#################################
######### Boilerplate ###########
#################################

if __name__ == '__main__':
    def add_root(doc):
        doc.add_root(create_plot())
    server = Server({'/': add_root})
    server.start()
    server.io_loop.add_callback(server.show, "/")
    server.io_loop.start()
else:
    curdoc().add_root(create_plot())

This is the HTML equivalent of drawing an image of that size pixel by pixel onto an HTML-canvas:

https://jsfiddle.net/oufka37j/

Now, I realize this is an completely unrealistic example because normally you should downsample images that big (which I actually am doing) but I care more about learning some about the intrisics of bokeh and what limitation it has (for what reason, e.g., HTML/JS cannot handle it)

Bryan · August 2, 2021, 7:01pm

The fiddle is doing something entirely different than what Bokeh does.

The fiddle creates a 5k canvas and loops to draw on it, never actually storing any of the data used for the drawing. The size of that canvas is roughly 100MB (5000 x 5000 x 32 bit RGBA). That’s starting to be alot for one browser tab, but not (evidently) unmanageable.

By contrast the Bokeh app serializes and sends the entire scalar array from Python over the network to a Float64Array typed array in the browser. That right right there is ~200 MB since numpy defaults to float64. Now, since we have used the image glyph, there is a colormapping step in the browser (that is what image is for). That means creating another 100MB RGBA array to then display by passing to ctx.draw_image on the 100MB canvas. So now you are pushing near to half a GB of data in one browser tab, and that’s just off the top of my head. Bokeh is intended to send all the data, to afford interactivity over that data, but past a certain point browsers will not handle that mode of use.

Bokeh is not magic, and if you want to just draw a 5k canvas while throwing away all the data that went into the drawing, Bokeh is not the right tool for that job. Otherwise, some options:

Colormap the data yourself in Python, and use image_rgba instead of image. That will probably halve the data usage right there
Downsample in Python. You could do this manually, but you could also leverage high level tools like Holoviews which can efficiently coordinate Bokeh and Datashader for large data sets.

kaprisonne · August 2, 2021, 7:26pm

Thank you very much for the response, it provides a bunch of useful insights. I was under the assumption that a browser tab can use as much RAM as the system has available without bogging down.

I was actually using Datashader for downsampling, however due to the massive import times I decided to use the underlying algorithm instead (https://github.com/esa-esdl/gridtools/blob/master/gridtools/resampling.py).

Thanks for the help! I hope to showcase my all-bokeh app once I am done with my thesis.

Bryan · August 2, 2021, 7:27pm

Also note that at these sizes the websocket xfer itself is not trivial. Changing down to N=2000, the websocket transit time alone was ~7s, and that was on localhost.

Bryan · August 2, 2021, 7:28pm

AFAIK different browsers implement different hard resource limits per-tab, which also may or may not be configurable.

kaprisonne · August 2, 2021, 7:30pm

Nice! Knowing of these (hidden) limitations is invaluable to me.

Bryan · August 2, 2021, 7:32pm

One other difference to mention: you created a 5k canvas with figure but that does not account for axis and title border areas or range padding. The 5k RGBA image gets drawn into a an area smaller than 5000x5000 in the middle of the canvas. That means the browser has to do its own rescaling, antialiasing, neighbor interpolation, etc. when draw_image is called, which also may not be trivial.

system · October 31, 2021, 7:33pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.