To start off, if this sounds at all familiar, I am actually picking up on a project that has been experiencing this issue since v0.12.15. The person working on the software at the time created a post about the problem in the Github issue section. For whatever reason he did not follow up on the problem, and the software has been running on Bokeh version 0.12.14 ever since. Since I have picked up the project I’ve attempted to run it using more recent Bokeh versions hoping that the problem may have not persisted, but all versions after 0.12.14 seem to exhibit the same problem.
In summary, we use a ColumnDataSource to stream new values to two Bokeh plots, currently updating at about 5 Hz. Using Bokeh version 0.12.14 and older, we can successfully stream data for days at a time with the plot updating accordingly. However, when using version 0.12.15 or newer (tested through 1.3.4) the plot will stop updating after some intermittent time period usually between 10-25 minutes despite continuing
stream() calls. When this freeze occurs, the Bokeh server process continues running, but seems to be unresponsive.
As suggested by Bryan on the Github issue post, I began by bisecting the dev builds between the 0.12.14 and 0.12.15 releases. I was able to narrow down the change as occuring between 0.12.15dev2 and 0.12.15dev3. From there I was able to further narrow down the problem to a single commit: 3ede9f0. Whatever is causing the problem seems to be tied to a specific change to the WsHandler in
@gen.coroutine def write_message(self, message, binary=False, locked=True): ''' Override parent write_message with a version that acquires a write lock before writing. ''' def write_message_unlocked(): future = super(WSHandler, self).write_message(message, binary) # don't yield this future or we're blocking on ourselves! raise gen.Return(future) if locked: with (yield self.write_lock.acquire()): write_message_unlocked() else: write_message_unlocked()
@gen.coroutine def write_message(self, message, binary=False, locked=True): ''' Override parent write_message with a version that acquires a write lock before writing. ''' if locked: with (yield self.write_lock.acquire()): yield super(WSHandler, self).write_message(message, binary) else: yield super(WSHandler, self).write_message(message, binary)
I noticed that though the server-side WSHandler was changed, the client-side
write_message routine (
bokeh/client/websocket.py) still uses the
write_message_unlocked function, returning a future. Would anyone be willing to share the reasoning behind this change? Does anyone have any insight into why this change may have caused the Bokeh plot to stop updates at an intermittent time?
As an additional note, I am attempting to create a stripped down reproducer for this issue, though it is taking some time because I don’t know what is contributing to the issue, and it takes 10-40 minutes to run a single test for the bug. However, as soon as I have one that I am confident in I will post it here.