Data structures and methods for efficient plot updates

I’m running a bokeh server with a not small CDS (ca. 5000 rows x 100 columns) doing lots of plot updates, so the update speed is crucial. I’m curious whether there are some approaches regarding data structures and methods used that may be slightly more efficient than the others, e.g. in the following cases:

  • For the CDS creation, is there a performance difference in using python lists, numpy arrays, or pandas series?
  • For a single column (or few columns) update of the CDS, is it better to patch the entire column, to update CDS data dictionary inplace, or to update data dictionary separately and reassign in to the CDS?

In addition, in cases of one-element lists in the CDS, is it better to patch the 0-index element, to stream with rollover=1, or to reassign new data dictionary to the CDS?

Thanks!

I don’t have any hard benchmarks to provide. Here are observations drawn from my expertise with the codebase.

For the CDS creation, is there a performance difference in using python lists, numpy arrays, or pandas series?

A CDS holds references to everything inside it, no copies are made. So for creation time, there really should not be any appreciable difference. But only Numpy arrays and Pandas series can use the binary serialization protocol [1] over a Bokeh server websocket, so the can be considerably faster to update than lists are.

For a single column (or few columns) update of the CDS, is it better to patch the entire column, to update CDS data dictionary inplace, or to update data dictionary separately and reassign in to the CDS?

There is never any reason to use patch on an entire column. Bokeh has special-cased codepaths for updating entire individual columns at a time, in-place. Assigning an entire new .data dict value will always send all the data, every time.

In addition, in cases of one-element lists in the CDS, is it better to patch the 0-index element, to stream with rollover=1 , or to reassign new data dictionary to the CDS?

I can’t imagine it could matter much either way with only a single element.


  1. for supported dtypes ↩︎

1 Like

Thanks a lot @Bryan, this is very useful! I was not aware of updating an entire column in-place. In the docs for modifying data it only mentions adding a new column, and then it describes using patch for replacing data.

When updating this way, you have to be careful that the new column is exactly the same length as all the other existing columns, otherwise it is undefined behavior . We tend to advertise features less if they might lead users into issues.

1 Like