Communicating with bokeh server and manipulating internal data from separate python processes


I have recently run into a use case that I find interesting and might be a common use case of the research community.

Basically, we were using bokeh as a front end of our data analysis pipeline. Our intended users mainly are people who are not very familiar with coding and prefers graphical interfaces.

However, we also have seasoned users who wish to inject their own analytical code into the pipeline and manipulate data in an online fashion.

I am wondering if it is possible to sideload an interactive terminal such as ipython in the front end such that users have the option to run custom code; or perhaps if there is a way to communicate to the bokeh server from a separate thread and import/export internal data through remote process calls. Right now we have a rather ugly solution: we would export internal data to a .h5 file; let the user modify it; and upload it back to the server from the browser.

We wonder if there are better solutions.

It seems that the bokeh.client module potentially offers this function but information about it is scarce.


1 Like

This is a pretty broad topic, and there are various potential approaches. What might work for you will really depend on the specifics of your situation.

First things first. bokeh.client is specifically for synchronizing Bokeh models across different processes. I.e., your analyst would create actual Bokeh ColumnDataSource objects, and update those locally in order to update the remote session. I am guessing that is not actually what you want, and that your analysts actually only want to work with arrays, dataframes, etc. But if it is what you want, then bokeh.client might be useful. Note that information is scarce on purpose. Early on users tried to use bokeh.client casually in ways that we can’t really support, so we have strongly de-emphasized it, do not promote it, and really only use it for helping with certain kinds of testing. If you want to use it you’ll probably need to make some investment with learning Bokeh at a lower level than most users.

Other options are things like:

  • sending data to sessions with thing like sockets, websockets, zmq, etc.
  • pushing data to external databases, like redis, etc. that a Bokeh session can monitor
  • at the bargain-basement level, write updates to data files that the session can access

In any case, I keep mentioning “session” because that is also a huge consideration. Are these analysts running the Bokeh apps for themselves, and they are the only user of a given Bokeh server? Or is there one Bokeh server and many analysts opening sessions on the same server? Or even more complicated: are you running multiple Bokeh servers behind a load balancer? In that case you won’t even know what process is hosting a session.

In all the cases, you’ll need to devise a way to communicate the session ID to the analysts, so that when they push updates that the Bokeh server app code can pick up, that those updates are “routed” to the correct session. e.g. the analysts “push” new data to a database keyed under their session ID using some API you make for them, then the Bokeh app code can pick up the modifications using a pub sub or polling mechanism.

I should be clear: Bokeh has no opinions at all on how any of this this might happen. There’s just too many options / use-cases and not enough project resources to try to be opinionated here. The necessary parts are all there IMO, but you will definitely be building out some minimal scaffolding that happens to work for you.