Multiple datasource patch/stream

asmodehn · May 19, 2020, 4:44pm

Hi everyone,

I have been playing around with bokeh for the last few weeks, and I managed to do some nice things.

However I still cannot get multiple datasource to patch/stream to the webpage/document. Whether it is in a table or in a figure… This should be doable right ?

I was wondering if someone had some code with this usecase already working, so that I can use it as an example ?

Thanks for any help…

Ronald_Truong · May 19, 2020, 5:53pm

Have you tried the example in the docs? This worked really well when I was trying to figure out this library :).

Bryan · May 19, 2020, 9:07pm

Hi @asmodehn it would also really help focus the discussion if you share actual code for what you have attempted.

asmodehn · May 20, 2020, 12:43pm

Thanks for the quick reply.

Yes I’ve seen the documentation, I guess I was looking for a more ‘involved’ code example…
By the way I am running the server from async code, for which I couldn’t find documentation about (seem to fit right into Running a Bokeh server — Bokeh 2.4.2 Documentation).

The code I am using for testing things is over there :

livebokeh/datamodel.py at e5319f097c4b1ec318c085a6fe365b865d207e5e · asmodehn/livebokeh · GitHub
livebokeh/dataview.py at e5319f097c4b1ec318c085a6fe365b865d207e5e · asmodehn/livebokeh · GitHub
These are two modules executable on their own (dataview depends on datamodel) to test dynamic table and plot updates… code is still full of WIP comments however, sorry about that.
The usecase is a bit different than usually with bokeh I feel, as I want the datamodel to drive the visualization (and not the other way around).

After a bit more testing it seems that calling once document.add_periodic_callback() with a “composed” callback to add callbacks, works as expected. But calling multiple times document.add_next_tick_callback(), only one is actually taken into account…

I m not sure if I am holding this the wrong way, or if there is a problem hidden somewhere…
Thanks for the help.

Bryan · May 20, 2020, 3:54pm

I’m sorry I don’t have time to dig in to the linked code just now but I wanted to drop in a link to this example that patches three separate sources:

https://github.com/bokeh/bokeh/blob/master/examples/howto/patch_app.py

It’s a “standard” bokeh app with sounds like it is different from your usage but perhaps it is useful.

I don’t really know anything about this:

But calling multiple times document.add_next_tick_callback() , only one is actually taken into account…

The very most helpful thing here would be a tiny minimal reproducer so that there is no ambiguity about the behavior being observed.

asmodehn · May 20, 2020, 5:26pm

I attempted to write here a minimum example to reflect my usecase:

github.com

asmodehn/livebokeh/blob/master/bokeh_debug.py

from datetime import datetime

import pandas
import typing

from bokeh.layouts import row
from bokeh.models import ColumnDataSource, DataTable, TableColumn
from bokeh.plotting import Figure


class Clock:

    data: pandas.DataFrame

    _cds: typing.List[ColumnDataSource]

    @property
    def source(self) -> ColumnDataSource:
        cds = ColumnDataSource(self.data)
        self._cds.append(cds)

This file has been truncated. show original

I m focusing here only on the usage of stream, attempting to drive the webpage update…

Doing this, I encounter yet another strange behavior : it seems that, somehow, the DataTable on the webpage actually “grabs” all the stream updates… so I am still very confused…
I couldn’t find any issue with the python code so far, the callbacks are called as expected.

Anyway I hope this will help as a clean codebase.

Bryan · May 20, 2020, 5:39pm

@asmodehn Python scope / lambda capture does not work the way you seem to be expecting:

In [5]: funcs = []

In [6]: for x in ("foo", "bar", "baz"):
   ...:     funcs.append(lambda: print(x))
   ...:

In [7]: for f in funcs: f()
baz
baz
baz

That would explain why only one CDS gets updated.

Is there a reason not to put the CDS loop inside a single lamba, instead of at the top level? If so, your best bet is probably to use functools.partial to bake in an argument value to the callback.

asmodehn · May 21, 2020, 9:15am

Ah, thanks a lot.
Something I really didn’t expect and would’ve spent weeks looking for.
https://docs.python.org/3/faq/programming.html#why-do-lambdas-defined-in-a-loop-with-different-values-all-return-the-same-result

Indeed now the different behaviors I saw do make more sense…
I could fix it with the fix mentioned in the FAQ, using a default variable to save the value.

I don’t think I can put the CDS loop inside a single lambda, as each of these must be called for potentially different documents (created for potentially different web requests), and I don’t know when their next tick is going to happen…
From the API at least, it seems to me that:

A CDS belongs to one document
Each document has a (potentially different) tick.
So I must schedule each CDS update into potentially different ticks…

Bryan · May 21, 2020, 7:37pm

This is true—in fact every Bokeh model (e.g. Plot, Range1d, whatever) can only belong to exactly one Document. But the converse is not true, of course. A Document can have as many CDS objects as it needs.

Each document has a (potentially different) tick.

I don’t really understand what this is suggesting. In the code you provided, for any given user session, all the CDS there are all always in one single Document.

asmodehn · May 23, 2020, 9:06am

First part : the Document

Each document has a (potentially different) tick.

I don’t really understand what this is suggesting. In the code you provided, for any given user session, all the CDS there are all always in one single Document.

I guessed that each webbrowser connecting to the server potentially creates a document → multiple connections creates multiple documents (ie instances of Document).

I understand this is what you mean, more precisely, by “for any given user session”.

From the doc:

Sessions have a 1-1 relationship with instances of bokeh.document.Document : each session has a document instance. When a browser connects to the server, it gets a new session

So I potentially have multiple documents to deal with when streaming to update the plots. Even in the code I provided, multiple webbrowser connections will create multiple documents, right ?

Note my Clock class wants to abstract as much as possible any server internals, the goal is just to encapsulate some dataframe and provide (debug-style) visualization of that data when someones connects to the server.

Second part : the tick

The document has a add_next_tick_callback() method, so I guessed that each document potentially has a different tick, and therefore has this method (instead of having it somewhere else - like on the server ?).

I am using only one server process in this example here (my immediate usecase), but I am looking for a “generic” design given bokeh design, even if I am not sure of the best way to run multiprocess yet…

But, given that bokeh server is single-threaded, maybe there is a simpler/better way to get the different ticks in my case ? maybe scheduling callbacks on the tick of the server, via https://docs.bokeh.org/en/latest/docs/dev_guide/server.html#lifecycle for example ?