Bokeh ceases to stream data after intermittent time interval

bel-lau · August 16, 2019, 5:56pm

To start off, if this sounds at all familiar, I am actually picking up on a project that has been experiencing this issue since v0.12.15. The person working on the software at the time created a post about the problem in the Github issue section. For whatever reason he did not follow up on the problem, and the software has been running on Bokeh version 0.12.14 ever since. Since I have picked up the project I’ve attempted to run it using more recent Bokeh versions hoping that the problem may have not persisted, but all versions after 0.12.14 seem to exhibit the same problem.

In summary, we use a ColumnDataSource to stream new values to two Bokeh plots, currently updating at about 5 Hz. Using Bokeh version 0.12.14 and older, we can successfully stream data for days at a time with the plot updating accordingly. However, when using version 0.12.15 or newer (tested through 1.3.4) the plot will stop updating after some intermittent time period usually between 10-25 minutes despite continuing stream() calls. When this freeze occurs, the Bokeh server process continues running, but seems to be unresponsive.

As suggested by Bryan on the Github issue post, I began by bisecting the dev builds between the 0.12.14 and 0.12.15 releases. I was able to narrow down the change as occuring between 0.12.15dev2 and 0.12.15dev3. From there I was able to further narrow down the problem to a single commit: 3ede9f0. Whatever is causing the problem seems to be tied to a specific change to the WsHandler in bokeh/server/views/ws.py.

v0.12.14:

@gen.coroutine
def write_message(self, message, binary=False, locked=True):
    ''' Override parent write_message with a version that acquires a
    write lock before writing.
    '''
    def write_message_unlocked():
        future = super(WSHandler, self).write_message(message, binary)
        # don't yield this future or we're blocking on ourselves!
        raise gen.Return(future)
    if locked:
        with (yield self.write_lock.acquire()):
            write_message_unlocked()
    else:
        write_message_unlocked()

v0.12.15:

@gen.coroutine
def write_message(self, message, binary=False, locked=True):
    ''' Override parent write_message with a version that acquires a
    write lock before writing.
    '''
    if locked:
        with (yield self.write_lock.acquire()):
            yield super(WSHandler, self).write_message(message, binary)
    else:
        yield super(WSHandler, self).write_message(message, binary)

I noticed that though the server-side WSHandler was changed, the client-side write_message routine (bokeh/client/websocket.py) still uses the write_message_unlocked function, returning a future. Would anyone be willing to share the reasoning behind this change? Does anyone have any insight into why this change may have caused the Bokeh plot to stop updates at an intermittent time?

As an additional note, I am attempting to create a stripped down reproducer for this issue, though it is taking some time because I don’t know what is contributing to the issue, and it takes 10-40 minutes to run a single test for the bug. However, as soon as I have one that I am confident in I will post it here.

Thank you,

Lauren

Bryan · August 16, 2019, 6:38pm

@bel-lau I should first point out this recent reply in another thread:

To reiterate what is said there, this kind of approach (e.g. running a “blank” bokeh server and pushing everything to it from outside processes) is not at all the way the core developers intend for the bokeh server to be used. The bokeh.client has extremely narrow intended applications: 1) testing, and 2) making one time up front tweaks (i.e. per user customizations) to server-generated sessions before embedding them in rendered pages.

Would anyone be willing to share the reasoning behind this change?

The change was made in the server code to simplify it and support better logging. If making a similar change in the client code helps your use case (I doubt it), then we’d be happy to consider a PR. Though again, I cannot stress enough: this kind of usage explicitly dis-encouraged by us (which is why there are no examples of this sort of thing anywhere in the docs or examples) So, I would be much more interested in discussing how to help you restructure your app in a more supported way.

bel-lau · August 16, 2019, 7:28pm

I did see the response you gave to the other user, but I guess I must have been confused.

After looking into a bit more of the server documentation I think I have a better understanding of what you mean - I believe I have had the wrong impression of what roles the server and client were meant to serve. The application currently communicates with an external machine over UART to retrieve data used to update the Bokeh plot as well as other elements of the GUI. Would I be correct in saying that I should move communication and parsing to the server-side application and use callbacks in the Bokeh server for all element updates that are triggered by incoming packets, not just updates to the plot? If that’s the case, what responsibilities should the client side maintain, if any?

Again, sorry for the confusion - I really appreciate the help.

Lauren

Bryan · August 21, 2019, 1:12am

Would I be correct in saying that I should move communication and parsing to the server-side application and use callbacks in the Bokeh server for all element updates that are triggered by incoming packets, not just updates to the plot? If that’s the case, what responsibilities should the client side maintain, if any?

Well, for starters I would definitely say that that all the Bokeh plotting, widgets, etc, code should go in a Bokeh server app that actually runs on the Bokeh server for each session. Then the question is how to get data updates to the app sessions, and what I might recommend depends on several factors:

Should every session see exactly the same data (i.e. if two browsers have the app open)
Is it sufficient for the page to update by polling (i.e. a “pull” model for updates)?

Becasue there are lots of possibilities:

If a “pull” model is OK, then the server sessions could poll an external data source with add_periodic_callback or possibly even the browser could poll directly, outside the server with AjaxDataSource. Or threads could poll, either per-session or per-server (depending on whether every session sees the exact same data)
If a “push” model is needed (i.e. you want the external data source to trigger immediate updates by sending new data somehow) then:
- potentially a database could be used to synchronize things
- or you could embed the Bokeh server programmatically and add custom Tornado request handlers to handle data pushes (I think Dask does things this way for its diagnostic dashboard)
- probably other things too

Given all that, it’s still up in the air whether the push goes directly to the bokeh server, or some intermediate other process first. That’s not something I can really advise on.

This kind of usage is definitely outside the envelope of basic, typical usage, so there are lots of possible approaches and the right one depends on specifics.