Data update issues with streamed data and bokeh server

linhz0hz · July 9, 2020, 3:29am

I am very new to bokeh, and I hit some obstacle while trying to use streaming data and bokeh server.
What I want to achieve is to have the server code keep generating some data and serve the same data to all browser sessions that connects to this. Reading the documentation, I thought this can achieve what I want to do:

import numpy as np

from bokeh.io import curdoc
from bokeh.layouts import column
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure

# Set up data
N = 100
data_source = ColumnDataSource(data={
‘time’ : np.zeros(0,dtype=np.float32),
‘temperature’ : np.zeros(0,dtype=np.float32),
‘current’ : np.zeros(0,dtype=np.float32)
})

# Set up plot
plot_temperature = figure(plot_height=400, plot_width=400, title=“temperature”,
tools=“crosshair,pan,reset,save,wheel_zoom”,
y_range=[20, 30])
plot_temperature.line(‘time’, ‘temperature’, source=data_source, line_width=3, line_alpha=0.6)

plot_current = figure(plot_height=400, plot_width=400, title=“Current”,
tools=“crosshair,pan,reset,save,wheel_zoom”,
y_range=[20, 30])
plot_current.line(‘time’, ‘current’, source=data_source, line_width=3, line_alpha=0.6)

index=0
def update_data():
global index
new_data={
‘time’ : np.array([index]),
‘temperature’ : np.random.uniform(20,30,1),
‘current’ : np.random.uniform(20,30,1)
}
data_source.stream(new_data,rollover=N)
index+=1
print(“update:”,index)

# Set up layouts and add to document
plot_area = column(plot_temperature,plot_current,width=500)

curdoc().add_periodic_callback(update_data, 50)
curdoc().add_root(plot_area)
curdoc().title = “Environment Log”

Here is what I am confused
First, I want the server generating data even there is no browser connected to it. That is not the case. It only begins as one connection is made.
Second, if I only have one connection and keeps it forever, it works fine. But once I open a differnet connection, the data is not updated properly.
I suspects I am misunderstanding the concept of Document in bokeh. But I do not know how to achieve what I want to achieve.

I also checked this example https://github.com/bokeh/bokeh/tree/2.1.1/examples/app/ohlc and it seems to have the same behavior: it only works with the first opened session.

p-himik · July 9, 2020, 6:52am

In Bokeh, documents are created only when requested, i.e. by opening pages that request corresponding Bokeh apps’ URLs.

As for running some code when there are no connections - you can either use lifecycle hooks or embed Bokeh as a library in your own code. This way, you would be able to start a separate thread or use the IOLoop.

linhz0hz · July 9, 2020, 12:13pm

Thanks. This explained the first part. However, why is the stream update not synchronized? Although documents are only created when requested, they should still be the same across browser sessions? For example if I update all of my data directly instead of using streams, I should see the same content across all browser sessions?

_jm · July 9, 2020, 12:46pm

@linhz0hz

I think this statement might be a source of the confusion.

Each session is unique and has its own document associated with it. If user A and user B connect to your server and interact with the application, their actions are intentionally separate and state is not shared.

This excerpt from the bokeh server documentation explains things.Server Architecture - Applications, Sessions, and Connections

Sessions have a 1-1 relationship with instances of bokeh.document.Document : each session has a document instance. When a browser connects to the server, it gets a new session; the application fills in the session’s document with whatever plots, widgets, or other content it desires.

linhz0hz · July 9, 2020, 1:58pm

Thanks for the clarification. So I guess my callback should be registered to the application instead of the document? Would the best way be adding a periodic callback to the server_context in on_server_loaded? I am looking at this example https://github.com/bokeh/bokeh/tree/2.1.1/examples/app/spectrogram which might be what I should follow?

The thing I want to build is some lab-monitor program that runs on boards like raspberry pi. So I would have a bokeh application obtain some sensor readings and have that visualized on any devices in the local network. And I would like to have each session seeing the same data.

_jm · July 9, 2020, 3:36pm

@linhz0hz

I expect that the best solution really depends on the details of your system such that I would not be able to recommend how you should proceed at a low level.

With that caveat, I chose the following architecture for some systems that are at least conceptually similar to what you describe.

(1) A dedicated embedded platform handles sensor configuration, data acquisition, and management. Depending on the requirements, it also logs readings to a database and runs a TCP/IP server that can stream the data over ethernet to clients that connect.

I have done this with both high-end systems-on-a-chip and commercial single-board computers like the pi. In my cases, precise acquisition and some real-time control on the embedded platform were always an important concern, so I did not want other things - like a visualization server - running on the same computer.

(2) A second computer runs a bokeh server, and has an ethernet connection to the embedded platform. This could easily be a pi as well. I’ve used Miniconda / berryconda distributions with bokeh on the pi before and it was more than sufficient for carefully written visualization UIs.

The server software establishes a TCP/IP connection to a specific sensor platform requested by a user’s client who accesses the bokeh server. The data from the TCP/IP stream are parsed and dispatched to bokeh data sources to stream in the client’s browser, for example.

In general, my problems have been M-to-1-N, where M is the number of clients/users/consumers who want to interact with/analyze the data, 1 is the number of bokeh servers, and N is the number of sensor platforms.

I hope this helps.

linhz0hz · July 9, 2020, 9:19pm

Thanks for the reply. I will compare these approaches later. But I think no matter if the data is generated locally, or streamed in through network, the core problem I want to solve now is how can I stream every piece of new data to update all live session / documents? It is still a bit unclear to me how to do that reliably.

_jm · July 9, 2020, 9:24pm

Understood.

If the data are acquired locally versus over the network, you might have a separate thread that is doing the data acquisition from the sensors and then use pipes or similar for inter-process communication to the bokeh server sessions.

Or the sensor acquisition task logs the data to a file or database (so that you have an archive of it), and the sessions read from that database for visualization.

Or …

linhz0hz · July 9, 2020, 9:39pm

Let’s say I have a separate thread doing the data acquisition, and have a pipe or thread-safe way to pass data around. But the bokeh server only has one copy of that data, and needs to update more than one document.

_jm · July 9, 2020, 10:14pm

Okay. If you want to deal with the one-writer-to-multiple-readers, you could use Python’s multiprocessing Queue or such.

If you want to do everything in bokeh and not deal with that you could possibly explore adding logic in the server’s lifecycle hooks of app_hooks.py, in say the on_server_loaded() method so that it gets run when the server is loaded.

And then you’d need to see what is and is not possible to pass the data around between that server_context and the session_context of each client session/document.

I personally don’t know what is possible at that level without experimenting with it.