Streaming Plot Won't Update in Browser

WxmanJ · December 19, 2019, 10:28pm

I have a time series setup that should check for a data update every 5 seconds. My issue is the web browser plot does not seem to update from the server callback. I assume I am missing something obvious somewhere.

The file test.txt is the data file and will contain any data updates.

import pandas as pd
from bokeh.plotting import figure, show, ColumnDataSource, output_file
from bokeh.models import ColumnDataSource, HoverTool, WheelZoomTool, BoxAnnotation, Select, CrosshairTool, Span, Range1d
from bokeh.palettes import Spectral3
from bokeh.io import output_notebook, curdoc, reset_output
from bokeh.layouts import column
import numpy as np
reset_output()
output_notebook()

output_file('/home/awips/time_series_update.html')
df = pd.read_csv(('/tmp/app/time_series/data/test.txt'))

def update():
    new_data = pd.read_csv(('/tmp/app/time_series/data/test.txt'))
    result = {'DATE': new_data['DATE'], 'WS': new_data['WS'], 'WSMAX': new_data['WSMAX'], 'P': new_data['P'], 'T': new_data['T'], 'RH': new_data['RH'], 'WD': new_data['WD']}
    print(result)
    source.stream(result, 5)
    print("Check to see if update is working")
    return

df['DATE'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S')
df['WS'] = df['WS']*1.94384
df['WSMAX'] = df['WSMAX']*1.94384
                                  
grouped = df.groupby('DATE').sum()
source = ColumnDataSource(grouped)

p1 = figure(x_axis_type='datetime', title='NAM 0-30mb AGL Forecast Wind Speed', plot_width=800, plot_height=400)
p1.y_range=Range1d(0,30)
p1.line(x='DATE', y='WS', line_width=2, source=source, color='black')
p1.triangle(x='DATE', y='WS', size=12, source=source, line_color='black', fill_color='red')
p1.yaxis.axis_label = 'Wind Speed (knots)'
p1.axis.axis_label_text_font_style = "bold"

high_box = BoxAnnotation(bottom=15.0, fill_alpha=0.1, fill_color='red')
p1.add_layout(high_box)


p1.add_tools(HoverTool(show_arrow=True, line_policy='nearest', tooltips=[
    ('Date', '@DATE'),('Wind Speed', '@WS')
]))
p1.add_tools(CrosshairTool())
p1.toolbar.active_scroll = p1.select_one(WheelZoomTool)
p1.toolbar.logo = None
p1.toolbar_location = None
new = Span(location=15.0, dimension='width', line_color='red',line_dash='dashed', line_width=3)
p1.add_layout(new)

#show(p1)
curdoc().add_root(p1)
curdoc().add_periodic_callback(update, 5000)
curdoc().title = "METAR DATA"

Bryan · December 19, 2019, 11:23pm

Is this in the notebook? Or are your running it with bokeh serve?

WxmanJ · December 20, 2019, 5:03am

I am running it with bokeh serve.

WxmanJ · December 21, 2019, 12:14am

Here is an updated script. It does not utilize the streaming function because the streaming function didn’t seem to update the graphic when plotting. The issue with this script is it keeps copying over itself. I have not found a way to basically just refresh the graphic with new data when it arrives.

import os.path, time
import pandas as pd
from bokeh.plotting import figure, show, ColumnDataSource, output_file
from bokeh.models import ColumnDataSource, HoverTool, WheelZoomTool, BoxAnnotation, Select, CrosshairTool, Span, Range1d
from bokeh.io import output_notebook, curdoc, reset_output
from bokeh.layouts import column
import numpy as np
reset_output()
output_notebook()

#output_file('/home/awips/time_series_update.html')

p1 = figure(x_axis_type='datetime', title='NAM 0-30mb AGL Forecast Wind Speed', plot_width=800, plot_height=400)
p1.y_range=Range1d(0,30)
p1.yaxis.axis_label = 'Wind Speed (knots)'
p1.axis.axis_label_text_font_style = "bold"
high_box = BoxAnnotation(bottom=15.0, fill_alpha=0.1, fill_color='red')
p1.add_layout(high_box)
p1.add_tools(CrosshairTool())
p1.toolbar.active_scroll = p1.select_one(WheelZoomTool)
p1.toolbar.logo = None
p1.toolbar_location = None

#hover = (HoverTool(show_arrow=True, line_policy='nearest', tooltips=[
#('Date', '@DATE{%Y-%m-%d %H:%M}'),('Wind Speed', '@WS{%2.1f} kts')], 
#    formatters={'DATE': 'datetime','WS': 'printf',}))
#p1.add_tools(hover)

def plotgraph():
     df = pd.read_csv(('/tmp/app/time_series/data/test.txt'))
     df['WS'] = df['WS']*1.94384
     df['WSMAX'] = df['WSMAX']*1.94384
     df['DATE'] = pd.to_datetime(df['DATE'], format='%Y-%m-%d %H:%M:%S')
     #hover = HoverTool(tooltips=[("Date", "@DATE{%Y-%m-%d %H:%M}"),("Wind Speed", "@WS{%2.1f} kts")])
     source = ColumnDataSource(df)
     p1.line(x='DATE', y='WS', line_width=2, source=source, color='black')
     p1.triangle(x='DATE', y='WS', size=12, source=source, line_color='black', fill_color='red')
     hover = (HoverTool(show_arrow=True, line_policy='nearest', tooltips=[('Date', '@DATE{%Y-%m-%d %H:%M}'),('Wind Speed', '@WS{%2.1f} kts')], 
        formatters={
            'DATE': 'datetime',
            'WS': 'printf',
        }))
     p1.add_tools(hover)
     hover.mode = "vline"
    #    hover.mode = "vline"

def update():
     global current
     newtime = time.ctime(os.path.getmtime("/tmp/app/time_series/data/test.txt"))
     if current < newtime: #File has been modified so plot the new points
        plotgraph()
        current = newtime

current = time.ctime(os.path.getmtime("/tmp/app/time_series/data/test.txt"))
plotgraph() #plot graph for the first time before iterating the update loop
curdoc().add_root(p1)
curdoc().add_periodic_callback(update, 1000)
curdoc().title = "METAR DATA"

Bryan · December 21, 2019, 1:23am

It definitely does do that, if it is used correctly, as you can verify by running the OHLC example locally. There are also many unit and integration tests that continuously maintain the stream method functionality.

Since the example above is not complete (the data it relies on has not been provided), I can’t run it myself to experiment and determine the exact source of the usage error. However one certain problem is this:

result = {'DATE': new_data['DATE'], 'WS': new_data['WS'], 'WSMAX': new_data['WSMAX'], 'P': new_data['P'], 'T': new_data['T'], 'RH': new_data['RH'], 'WD': new_data['WD']}

All the values in this dict must be columns, i.e. lists, series, or arrays. That’s true even if you are streaming only a single new value—the values should be lists/arrays of length one. But your dict values are not columns, as expected, they are single numbers.

Michael_Heitmeier · September 8, 2020, 4:03pm

Sorry to jump into this but I also struggle to get streaming to work. I have other code that also uses dataframes and that works but I cannot figure out for the life of me why the following does not show the updates. As far as I can tell I’m not using single numbers at least.
Thanks for any hints!

import pandas as pd
import numpy as np
import time
from bokeh.io import show, output_notebook
from bokeh.layouts import row
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
output_notebook()

df = pd.DataFrame(np.array([
       [0.90, 1.90, 2.90],
       [0.95, 1.97, 2.93],
       [1.00, 2.00, 3.00],
       [1.10, 2.10, 3.10]]),
       columns=('x', 'y', 'z'))

plot_data_p1 = ColumnDataSource(df.loc[[0]][['x','y']])
plot_data_p2 = ColumnDataSource(df.loc[[0]][['x','z']])

opts = dict(plot_width=400, plot_height=400, min_border=0)

p1 = figure(**opts)
p1.scatter('x','y', source=plot_data_p1, alpha=0.5)
p2 = figure(**opts)
p2.scatter('x','z', source=plot_data_p2, alpha=0.5)
t = show(row(p1, p2))#, notebook_handle=True)

def update(df_new):
    plot_data_p1.stream(df_new[['x','y']])
    plot_data_p2.stream(df_new[['x','z']])

# Quick hack to simulate new data coming in
for a in range(1,df.shape[0]):
    update(df.loc[[a]])
    time.sleep(0.1)

Bryan · September 8, 2020, 5:15pm

You have to call push_notebook on the notebook handle after the stream calls. Unless you have embedded an actual Bokeh server application (not the case here) then all Python → JS synchronization is explicit, via a call to push_notebook

Michael_Heitmeier · September 9, 2020, 12:22pm

Excellent, that works, thank you very much!

Funny enough this code https://discourse.bokeh.org/t/bokeh-app-does-not-display-in-jupyter-notebook-using-python-3-8-3-is-this-a-bug/5729 works for me even though it does not use push_notebook. Why would that be?

For a total beginner like me it may be helpful to add “push_notebook” as the last line in the example given under https://docs.bokeh.org/en/latest/docs/user_guide/data.html#streaming, to an unsuspecting newbie it looks like “source.stream(new_data)” is all the magic that is needed.

I came to Bokeh via Matplotlib, pyplot and pyqtgraph but they all pale in comparison to the clean interface and the streaming which is the most important for me at the moment. Now if it would also do 3D scatter plots it would be perfect

Michael_Heitmeier · September 9, 2020, 2:19pm

With many push_notebook calls (around 2000) I get this error:

AttributeError Traceback (most recent call last)
/usr/lib/python3/dist-packages/ipykernel/iostream.py in _event_pipe(self)
96 try:
—> 97 event_pipe = self._local.event_pipe
98 except AttributeError:

AttributeError: ‘_thread._local’ object has no attribute ‘event_pipe’

During handling of the above exception, another exception occurred:

ZMQError Traceback (most recent call last)
_ctypes/callbacks.c in ‘calling callback function’()

in data_handler(self, ctx, data)
60 plot_data_p1.stream(new_data1)
61 plot_data_p2.stream(new_data2)
—> 62 push_notebook(handle=t)
63 #print(new_data1, new_data2)
64 self.samples+=1

/home/pi/.local/lib/python3.7/site-packages/bokeh/io/notebook.py in push_notebook(document, state, handle)
264 msg = Protocol().create(“PATCH-DOC”, events)
265
→ 266 handle.comms.send(msg.header_json)
267 handle.comms.send(msg.metadata_json)
268 handle.comms.send(msg.content_json)

/usr/lib/python3/dist-packages/ipykernel/comm/comm.py in send(self, data, metadata, buffers)
119 “”“Send a message to the frontend-side version of this comm”""
120 self._publish_msg(‘comm_msg’,
→ 121 data=data, metadata=metadata, buffers=buffers,
122 )
123

/usr/lib/python3/dist-packages/ipykernel/comm/comm.py in _publish_msg(self, msg_type, data, metadata, buffers, **keys)
69 parent=self.kernel._parent_header,
70 ident=self.topic,
—> 71 buffers=buffers,
72 )
73

/usr/lib/python3/dist-packages/jupyter_client/session.py in send(self, stream, msg_or_type, content, parent, ident, buffers, track, header, metadata)
746 # use dummy tracker, which will be done immediately
747 tracker = DONE
→ 748 stream.send_multipart(to_send, copy=copy)
749
750 if self.debug:

/usr/lib/python3/dist-packages/ipykernel/iostream.py in send_multipart(self, *args, **kwargs)
260 def send_multipart(self, *args, **kwargs):
261 “”“Schedule send in IO thread”""
→ 262 return self.io_thread.send_multipart(*args, **kwargs)
263
264

/usr/lib/python3/dist-packages/ipykernel/iostream.py in send_multipart(self, *args, **kwargs)
210 If my thread isn’t running (e.g. forked process), send immediately.
211 “”"
→ 212 self.schedule(lambda : self._really_send(*args, **kwargs))
213
214 def _really_send(self, msg, *args, **kwargs):

/usr/lib/python3/dist-packages/ipykernel/iostream.py in schedule(self, f)
201 self._events.append(f)
202 # wake event thread (message content is ignored)
→ 203 self._event_pipe.send(b’’)
204 else:
205 f()

/usr/lib/python3/dist-packages/ipykernel/iostream.py in _event_pipe(self)
99 # new thread, new event pipe
100 ctx = self.socket.context
→ 101 event_pipe = ctx.socket(zmq.PUSH)
102 event_pipe.linger = 0
103 event_pipe.connect(self._event_interface)

/usr/lib/python3/dist-packages/zmq/sugar/context.py in socket(self, socket_type, **kwargs)
144 if self.closed:
145 raise ZMQError(ENOTSUP)
→ 146 s = self._socket_class(self, socket_type, **kwargs)
147 for opt, value in self.sockopts.items():
148 try:

/usr/lib/python3/dist-packages/zmq/sugar/socket.py in init(self, *a, **kw)
57
58 def init(self, *a, **kw):
—> 59 super(Socket, self).init(*a, **kw)
60 if ‘shadow’ in kw:
61 self._shadow = True

zmq/backend/cython/socket.pyx in zmq.backend.cython.socket.Socket.init()

ZMQError: Too many open files

Bryan · September 9, 2020, 4:09pm

@Michael_Heitmeier The reason the first link works without push_notebook is that that example is embedding a real Bokeh server application. We can’t say to use push_notebook in the stream docs because most usage of stream does not require it. In fact most usage of stream is in Bokeh applications outside the notebook altogether, where it would in fact generate a runtime error to try to execute push_notebook. Calling push_notebook is only needed/possible:

when inside the notebook, and
when not running a Bokeh app

The stream method was not actually created with push_notebook in mind, at all. It was created very specifically to be a feature of Bokeh server apps, but apparently incidentally functions in some limited capacity with push_notebook.

Which actually gets to the last post. I don’t know what that specific error is (I have never encountered it personally). But in general, push_notebook is very old and also severely limited compared to an embedded a Bokeh server application. E.g. as the name states it can only push updates in one direction, from Python to JS. It’s not possible to respond to events like selections that originate on JS side with push_notebook. It only exists because it is what is possible at the time ~6 years ago before the modern Bokeh server existed as an option. Now that it’s possible to easily embed a real Bokeh server app, I would recommend that over using push_notebook in almost all cases.

Michael_Heitmeier · September 12, 2020, 3:46pm

Please set me straight, I’m afraid I just don’t get it. For how the embedding is done I looked at the ocean temperature example and learned that ‘show(x)’ triggers it. Trying to combine this with streaming from the example above does not paint the updates however. To me the structure of the code looks the same, just without the button, but something must be missing and I can’t figure out what. I also started with the ocean temp example and replaced the callback with a loop that contains the stream updates and was unsuccessful with that as well. What magic triggers the repaint? In neither example I see how that happens. My simple mind was intuitively drawn to ‘push_notebook’ as I want to trigger a repaint after an update, but apparently that’s not how it works.
Here is how far I got, which only shows the final plots but not all the steps in between:

from bokeh.io import output_notebook, show
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource, HoverTool
import pandas as pd 
import numpy as np
import time
from bokeh.layouts import row

output_notebook()

def modify_doc(doc):
    df = pd.DataFrame(np.array([
       [0.90, 1.90, 2.90],
       [0.95, 1.97, 2.93],
       [1.00, 2.00, 3.00],
       [1.10, 2.10, 3.10],
       [1.90, 2.90, 3.90],
       [1.95, 2.97, 3.93],
       [2.00, 3.00, 4.00],
       [2.10, 3.10, 4.10]]),
       columns=('x', 'y', 'z'))

    plot_data_p1 = ColumnDataSource(df.loc[[0]][['x','y']])
    plot_data_p2 = ColumnDataSource(df.loc[[0]][['x','z']])
    opts = dict(plot_width=400, plot_height=400, min_border=0)
    p1 = figure(**opts)
    p1.scatter('x','y', source=plot_data_p1, alpha=0.5)
    p2 = figure(**opts)
    p2.scatter('x','z', source=plot_data_p2, alpha=0.5)
    layout = grid(row(p1,p2))
    doc.add_root(layout)
    
    for i in range(1,df.shape[0]):
        plot_data_p1.stream(df.loc[[i]][['x','y']])
        plot_data_p2.stream(df.loc[[i]][['x','z']])
        time.sleep(0.5)    
    
show(modify_doc)

Bryan · September 12, 2020, 4:59pm

@Michael_Heitmeier The way a Bokeh server application works is that some code is run to create a Python data structure called a Document that contains all the Bokeh objects (plots, data sources, tools, etc). Then this Python Document is mirrored to a JavaScript version of the Document in a browser (and BokehJS uses this JS Document as instructions for what to render). Then, the Bokeh server keeps these two mirror copies of the document synchronized, so that when one changes, the other changes. That is how callbacks work across the Python / JS runtime boundary. As for repaints, Bokeh automatically repaints things any time a data source changes.

So why do things not “work” in your code above? It’s be cause the modify_doc app code that creates the Document is more analogous to a recipe. It is run once, up front, ^[1] and then its work is done. The code above does not result in any visible changes because all the data updates you are making happen before any of the content is ever even sent to the “JavaScript side”, much less even before it is rendered. The Document is only sent off and synced after the app code is finished setting everything up. That is why only the the “last state” is visible.

If you want there to be visible changes, they need to come from inside some callback that is set up by the app code initially, that actually runs later, after the content is rendered. This could be something a button or other widget action triggers, but there are also “periodic callbacks” that can execute changes on some regular schedule. There is a good example of starting/stopping periodic calbacks in the Gapminder example

technically, once per session but let’s ignore that for now, especially in the notebook ↩︎

Michael_Heitmeier · September 16, 2020, 1:30pm

Thank you for the patient explanation. I can’t claim to to understand all of it but I take it that the complex interaction with the browser is at least partially to blame for the difficult setup. I tried to replace the button action with a periodic callback but could not figure out how to pass an index as a parameter into the update function so that I can loop through my dataframe. In other words I’m giving up on bokeh at this point, having the zoom and all those good things would be nice but it seems I’d need a computer science degree to really understand it and that is just too much. I’ll go with simple Tkinter-based graphics instead but I’ll definitely keep bokeh in mind for more straightforward visualizations.

Bryan · September 16, 2020, 4:22pm

@Michael_Heitmeier You can’t pass an index directly, because Bokeh calls the callback function, not you (which also means all callbacks must have the exact same fixed set of parameters) I have seen this as a point of confusion in other callback-centric tools and libraries, so it seems like a common stumbling block. The Gapminder uses a slider value itself as the “counter” since that is accessible inside the callback code, but if you don’t want or need a widget, there’s a couple of ways to accomplish what you want:

use a global counter
```
i = 0
def callback():
    global i
    i = i + 1 

curdoc().add_periodic_callback(callback, 200)
```
every session that’s opened runs in a private module, so this is only “global” per-session.

A little cleaner: use one of the decorators in bokeh.driving that were created for exactly this situation:

from bokeh.driving import repeat

@repeat(range(N))
def callback(i):
    # Every time the callback is called it will get a new value 
    # for `i` that counts from [0, N) and then repeats

curdoc().add_periodic_callback(callback, 200)