Avoiding pending writes and single document error when using CDS

Hello,
I have created an account here to get help or hints on how to use ColumnDataSources in files other than main, while avoiding mentioned errors in the post title.
It doesn’t seem to be a bug, but intended behavior, as explained in this Github issue.

I’m using (have to use) bokeh version 1.0.2, but I’ve also tested it on version 1.2, getting the same errors though.

My problem is the following, maybe best described by short examples:

#1 - CDS defined and manipulated in main.py

Code in main.py
from bokeh.plotting import figure
from bokeh.layouts import column, layout
from bokeh.models import Arrow,OpenHead
from bokeh.io import curdoc
from bokeh.models import ColumnDataSource


CDS  = ColumnDataSource(data=dict(xS=[], xE=[], yS=[], yE=[], lW = []))

CDS.data = dict(xS=[2.0], xE=[-1.0], yS=[1.0], yE=[1.0], lW = [2])

arrow_glyph = Arrow(end=OpenHead(line_color="#E37222",line_width= 2, size=5),
    x_start='xS', y_start='yS', x_end='xE', y_end='yE',line_width= "lW", source=CDS ,line_color="#E37222")

figure1 = figure(title="CDS and glyph in main", tools="", x_range=(-5,5), y_range=(-5,5),width=400,height=400)
figure1.add_layout(arrow_glyph)

doc_layout = layout(children=[column(figure1)])
curdoc().add_root(doc_layout)

#2 - CDS defined in external file

Code in main.py
from bokeh.plotting import figure
from bokeh.layouts import column, layout
from bokeh.models import Arrow,OpenHead
from bokeh.io import curdoc
from bokeh.models import ColumnDataSource

from ext_cds_def import CDS

CDS.data = dict(xS=[2.0], xE=[-1.0], yS=[1.0], yE=[1.0], lW = [2])

arrow_glyph = Arrow(end=OpenHead(line_color="#E37222",line_width= 2, size=5),
    x_start='xS', y_start='yS', x_end='xE', y_end='yE',line_width= "lW", source=CDS ,line_color="#E37222")

figure1 = figure(title="CDS externally and glyph in main", tools="", x_range=(-5,5), y_range=(-5,5),width=400,height=400)
figure1.add_layout(arrow_glyph)

doc_layout = layout(children=[column(figure1)])
curdoc().add_root(doc_layout)
Code in ext_cds_def.py
from bokeh.models import ColumnDataSource

CDS  = ColumnDataSource(data=dict(xS=[], xE=[], yS=[], yE=[], lW = []))

#3 - CDS manipulated in external file

Code in main.py
from bokeh.plotting import figure
from bokeh.layouts import column, layout
from bokeh.models import Arrow,OpenHead
from bokeh.io import curdoc
from bokeh.models import ColumnDataSource

from ext_cds_def import CDS

arrow_glyph = Arrow(end=OpenHead(line_color="#E37222",line_width= 2, size=5),
    x_start='xS', y_start='yS', x_end='xE', y_end='yE',line_width= "lW", source=CDS ,line_color="#E37222")

figure1 = figure(title="CDS externally and glyph in main", tools="", x_range=(-5,5), y_range=(-5,5),width=400,height=400)
figure1.add_layout(arrow_glyph)

doc_layout = layout(children=[column(figure1)])
curdoc().add_root(doc_layout)
Code in ext_cds_def.py
from bokeh.models import ColumnDataSource

CDS  = ColumnDataSource(data=dict(xS=[], xE=[], yS=[], yE=[], lW = []))

CDS.data = dict(xS=[2.0], xE=[-1.0], yS=[1.0], yE=[1.0], lW = [2])

You might notice that all examples work. However when reloading the page (or closing and opening it again), example #2 yields a pending write error

and example #3 yields a single document error.

I have already looked for similar threads, but none of those could help with my problem.
This one suggests to replace figure by Figure (capitalized F).
Here it is stated to “create completely new objects every time”.
Other threads had no solution provided or some new version fixed it for them.

It also seems to happen for other types, like buttons for example.

Currently the structure of the applications I’m working on is like this:

  • main.py (plotting the figures and building the page)
  • callback_functions.py (define all callback functions in this file)
  • helper_functions.py (define additional functions, which might be called inside some callback functions)
  • sources.py (define all ColumnDataSources or other global variables/constants)

My work around solution for this is using nested classes, since all functions and sources have to be in the same class to avoid the errors. Basically, I changed the whole files to classes. Then I can build a single object in main.py and stear everything from there. Reloading isn’t an issue anymore, since it seems that the object gets destroyed in contrast to the variant without using classes. I hope you understand, what I’m trying to explain :sweat_smile:
However, I’m not quite satisfied with this solution, since it makes the code quite broad and less readable. On the other side, I don’t wan’t to restructure the whole big codes by using OOP, this would take too much time and effort (basically coding everything new).

I have also stumbled across lifecycle hooks, which can detect if an application was closed. This solution isn’t satisfying too, since it takes 30-60 seconds until the function is called. Also it only seems to work on version 1.2 (reloading works after the session destroy function has been called), but not on version 1.0.2 (it can call the function, but reloading without an error isn’t possible after this).

So my questions are:

  • Is there any way to outsource CDS, functions etc. into external files/modules in mentioned style above without getting pending writes and/or single document errors?
  • Can one destory all (bokeh-)objects immediately after reloading/closing a page?
  • Why is this intended behavior and what’s the probem about destroying all objects of a single session/port when reloading/closing it, since you need all new objects anyway? Will this be fixed in a future release?
  • Do you have additional hints or remarks regarding this topic or generally prefered code structure for bokeh applications?

If you can answer any of these questions or know better work arounds, feel free to leave a comment :slight_smile:

Is there any way to outsource CDS, functions etc. into external files/modules in mentioned style above without getting pending writes and/or single document errors?

You would have to put the creation inside a function, and then have the app code call that function, e.g.:

# ext_cds.py

def get_cds():
    return ColumnDataSource(...)

The purpose of Bokeh application code (whether its a script, or a notebook, or passed to a function handler) is to create a new, unique set of Bokeh models for every session. To this end, he app code is executed every time a session starts. But Python’s module caching means that module-scope (“global”) Bokeh objects in external modules cannot work, because those objects would be cached and re-used between sessions (which does not work/is not allowed). We can’t change Python’s module caching behavior, so this situation is not going to change .

If you want to share large non-Bokeh data structures in modules between sessions, e.g. large arrays or DataFrames, that can be fine (but I recommend doing so only with read-only data). The spectrogram example demonstrates this. A CDS is a very thin wrapper around such things. You just have to make a new CDS so that each user has their own unique Bokeh objects for their own session. Bokeh objects cannot be shared between sessions. [1]

Can one destroy all (bokeh-)objects immediately after reloading/closing a page?

Bokeh objects are referred to from sessions, which are periodically reaped. They will be collected by normal Python GC after the session is expired. You can change the session expiration options but of course checking for unused sessions more frequently can have its own trade-offs, depending on the number of sessions you typically have, and how long they last, etc.

  • Why is this intended behavior and what’s the probem about destroying all objects of a single session/port when reloading/closing it, since you need all new objects anyway? Will this be fixed in a future release?

I don’t really understand what you are asking. Your issues seems all to be with not creating new objects for every session, and not anything to do with when session objects are released.


  1. To be crystal clear about why Bokeh models cannot be shared between sessions: Imagine another user pans a plot and your view changes as a result, because a range is shared. Or worse, another user has access to sensitive data in your session because a CDS is shared. Sharing objects between sessions is inherently unsafe and surprising and cannot be permitted. ↩︎

Hi, sorry for my late reply. I totally forgot to answer after starting another big project.
Your answer helped a lot, I wasn’t aware that this problem stems from Pythons caching behavior.

1 Like