Using Bokeh for larga data plots embedded in a web application

Marcel_Nemet · January 13, 2016, 12:37pm

Hi, I am writing a web application and I am looking for some plotting tool that is interactive (selecting points, callbacks to server…) and can handle large data, so I am trying Bokeh.

Downsampling.

I will be working with multiple timeseries. Each of them can have around 1 point per second, thus this is 246060=86400 points /day. This is 31M points per year. Thus downsampling in python before sending the data to the client is essential for me. At the same time users should be able to zoom and receive more detailed data. Is it possible to handle such downsampling e.g. using callback functions? Can you give me a hint how to start?

Embedding

Is it possible to embed a bokeh plot into website and have the data be transferred e.g. using websocket from bokeh server for such plot? I might have 2GB of timeseries data on HDD of server or in HDFS and I want user to be able to view parts of data. E.g. when fully zoomed out to view a downsampled version of timeseries. When embedding js+divs generated by script, div = components(plots) I understand that the data for the plot is embedded within the script. Can this be avoided? Generally I am interested in a combination of embedded plots + dynamic loading of data.

Bokeh server

I am having a hard time understanding a role of bokeh server. E.g. when I plot points in python and use output_server(“name”), is the data for the plot transferred to bokeh server via network? Is the data transfered to bokeh server as json or binary or is bokeh server using the same in-memory data as I worked with in python script before constructing the plot? What if I want to change the data or e.g. colors of the points in reaction to user actions or asynchroneously when I measure new data points for my timeseries? Can I just add new rows into my Pandas DataFrame and issue some kind of update to server?

Binary data transfer

I have seen a proposal on github regarding binary transfer of data (Look into use of DataViews and ArrayBuffers for more efficient data send/recv · Issue #2204 · bokeh/bokeh · GitHub). I think it is not implemented yet, is it possible to hack this in e.g. using callbacks? E.g. on zoom get the data from my python API where I will downsample and provide data in binary format from Pandas/numpy and use them as typed array in JS? Constucting JSONs in python can take a lot of time.

Thank you for any help

Bryan · January 14, 2016, 4:59pm

Hi Marcel,

Responses inline

Hi, I am writing a web application and I am looking for some plotting tool that is interactive (selecting points, callbacks to server...) and can handle large data, so I am trying Bokeh.

Downsampling.
I will be working with multiple timeseries. Each of them can have around 1 point per second, thus this is 24*60*60=86400 points /day. This is 31M points per year. Thus downsampling in python before sending the data to the client is essential for me. At the same time users should be able to zoom and receive more detailed data. Is it possible to handle such downsampling e.g. using callback functions? Can you give me a hint how to start?

I hope to have an actual example to point people to very soon. It's on my list, but there are many many things on the list and only so many hours in a day. That said I can give you some general direction. The first it to look over the examples here:

https://github.com/bokeh/bokeh/blob/master/examples/app

And in particular, maybe the simple exmaple

https://github.com/bokeh/bokeh/blob/master/examples/app/sliders.py

The thing to notice is that any changes you make to a Bokeh model (e.g., updating the .data on a ColumnDataSource) are automatically and transparently mirrored to the browser, which updates. The converse is also true. So putting those two things together, the general outline would be:

* add an .on_change callback a range(s) on the plot.
* that callback computes the downsampled data however you want
* then updates the .data on the data sources
* which updates the plot

Embedding
Is it possible to embed a bokeh plot into website and have the data be transferred e.g. using websocket from bokeh server for such plot? I might have 2GB of timeseries data on HDD of server or in HDFS and I want user to be able to view parts of data. E.g. when fully zoomed out to view a downsampled version of timeseries. When embedding js+divs generated by script, div = components(plots) I understand that the data for the plot is embedded within the script. Can this be avoided? Generally I am interested in a combination of embedded plots + dynamic loading of data.

Yes, certainly. Configure the initial plot data to have just whatever subset is appropriate to the resolution you want to display initially, and then you can update the data dynamically in the manner described above.

Bokeh server
I am having a hard time understanding a role of bokeh server. E.g. when I plot points in python and use output_server("name"), is the data for the plot transferred to bokeh server via network? Is the data transfered to bokeh server as json or binary or is bokeh server using the same in-memory data as I worked with in python script before constructing the plot? What if I want to change the data or e.g. colors of the points in reaction to user actions or asynchroneously when I measure new data points for my timeseries? Can I just add new rows into my Pandas DataFrame and issue some kind of update to server?

"output_server" is very limited, and I will suggest for your use case, you will really want to create an "app" for the server. There is a larger discussion of use-cases for the server and ways to use it here:

Bokeh server — Bokeh 3.3.2 Documentation

Please let me know what questions you have after looking at that, so that I may update the docs with more or different explanations.

Currently all data is converted to a JSON format before being sent over a websocket wire protocol. The protocol has been designed with the ability to send multi-part messages and binary frames in mind, but this capability has not been used yet.

Binary data transfer
I have seen a proposal on github regarding binary transfer of data (Issues · bokeh/bokeh · GitHub). I think it is not implemented yet, is it possible to hack this in e.g. using callbacks? E.g. on zoom get the data from my python API where I will downsample and provide data in binary format from Pandas/numpy and use them as typed array in JS? Constucting JSONs in python can take a lot of time.

As you mention, the binary transfer for arrays is not yet implemented. I don't see an easy way to "hack" this. Given that any disturbance in a webscket protocol is gong to be difficult or impossible to recover from, we have not easily exposed direct access to the websocket connection to users.

Thanks,

Bryan

···

On Jan 13, 2016, at 6:37 AM, Marcel Német <[email protected]> wrote:

Thank you for any help

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/64890475-d18d-417c-9801-20cb7ce60028%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Marcel_Nemet · January 15, 2016, 8:01am

Hi Bryan and bokeh team,

thanks you for all the answers and support, I now understand bokeh much better, I have tested running plots in notebook, running the script in bokeh server “using bokeh serve myapp.py”, and even using client to create session and document with session.loop_until_closed() for the down-sampling from yesterday example. For the latest case I understand that " in addition to network traffic between the browser and the server, there is network traffic between the python client and the server as well." (as stated in documentation). So when zooming occurs in browser, callback goes via bokeh server, then to my client which has to be doing session.loop_until_closed() sends downsampled data to server and server sends it to browser.

So I understand that the more efficient way is to use “bokeh serve myapp.py” i.e. to run the code on server directly and avoid the round trip to client.

Now my questions are:

When running myapp.py on server, what if I want to spin off new document or new document root (=new figure)? Is there some API to do that programatically? In my web app I want user to be able by selecting parameters (e.g. dataset, analytic method) to spin off a new interactive plot. (Such as the down-sampling example)

I thought that using the client approach where one can programattically create sessions, documents and figures would solve this. But then I had to run loop_until_closed on my client to achieve responding to on_change() callbacks.

Can application in bokeh server respond to asynchroneous call, e.g. I would like to notify my app running on bokeh server that there is a new data in the timeseries. Then the code in the bokeh app would fetch the data from remote location and include it into plot. Or maybe I want to notify my bokeh app that computation has finished on my analytics server and the new results are ready to be loaded.

I think the questions are related. I guess once I can make app that is running on bokeh server respond to asynchroneous calls I can just send it the necessarry parameters and run the code to create a new figure/session as described in 1).

Is this (invoking a function code on bokeh server asynchronously) possible?

Marcel

Dne čtvrtek 14. ledna 2016 17:59:10 UTC+1 Bryan Van de ven napsal(a):

···

Hi Marcel,

Responses inline

On Jan 13, 2016, at 6:37 AM, Marcel Német [email protected] wrote:

Hi, I am writing a web application and I am looking for some plotting tool that is interactive (selecting points, callbacks to server…) and can handle large data, so I am trying Bokeh.

Downsampling.

I will be working with multiple timeseries. Each of them can have around 1 point per second, thus this is 246060=86400 points /day. This is 31M points per year. Thus downsampling in python before sending the data to the client is essential for me. At the same time users should be able to zoom and receive more detailed data. Is it possible to handle such downsampling e.g. using callback functions? Can you give me a hint how to start?

I hope to have an actual example to point people to very soon. It’s on my list, but there are many many things on the list and only so many hours in a day. That said I can give you some general direction. The first it to look over the examples here:
    [https://github.com/bokeh/bokeh/blob/master/examples/app](https://github.com/bokeh/bokeh/blob/master/examples/app)
And in particular, maybe the simple exmaple
    [https://github.com/bokeh/bokeh/blob/master/examples/app/sliders.py](https://github.com/bokeh/bokeh/blob/master/examples/app/sliders.py)
The thing to notice is that any changes you make to a Bokeh model (e.g., updating the .data on a ColumnDataSource) are automatically and transparently mirrored to the browser, which updates. The converse is also true. So putting those two things together, the general outline would be:

add an .on_change callback a range(s) on the plot.

that callback computes the downsampled data however you want

then updates the .data on the data sources

which updates the plot

Embedding

Is it possible to embed a bokeh plot into website and have the data be transferred e.g. using websocket from bokeh server for such plot? I might have 2GB of timeseries data on HDD of server or in HDFS and I want user to be able to view parts of data. E.g. when fully zoomed out to view a downsampled version of timeseries. When embedding js+divs generated by script, div = components(plots) I understand that the data for the plot is embedded within the script. Can this be avoided? Generally I am interested in a combination of embedded plots + dynamic loading of data.

Yes, certainly. Configure the initial plot data to have just whatever subset is appropriate to the resolution you want to display initially, and then you can update the data dynamically in the manner described above.

Bokeh server

I am having a hard time understanding a role of bokeh server. E.g. when I plot points in python and use output_server(“name”), is the data for the plot transferred to bokeh server via network? Is the data transfered to bokeh server as json or binary or is bokeh server using the same in-memory data as I worked with in python script before constructing the plot? What if I want to change the data or e.g. colors of the points in reaction to user actions or asynchroneously when I measure new data points for my timeseries? Can I just add new rows into my Pandas DataFrame and issue some kind of update to server?

“output_server” is very limited, and I will suggest for your use case, you will really want to create an “app” for the server. There is a larger discussion of use-cases for the server and ways to use it here:
    [http://bokeh.pydata.org/en/latest/docs/user_guide/server.html#use-case-scenarios](http://bokeh.pydata.org/en/latest/docs/user_guide/server.html#use-case-scenarios)
Please let me know what questions you have after looking at that, so that I may update the docs with more or different explanations.

Currently all data is converted to a JSON format before being sent over a websocket wire protocol. The protocol has been designed with the ability to send multi-part messages and binary frames in mind, but this capability has not been used yet.

Binary data transfer

I have seen a proposal on github regarding binary transfer of data (https://github.com/bokeh/bokeh/issues/2204). I think it is not implemented yet, is it possible to hack this in e.g. using callbacks? E.g. on zoom get the data from my python API where I will downsample and provide data in binary format from Pandas/numpy and use them as typed array in JS? Constucting JSONs in python can take a lot of time.

As you mention, the binary transfer for arrays is not yet implemented. I don’t see an easy way to “hack” this. Given that any disturbance in a webscket protocol is gong to be difficult or impossible to recover from, we have not easily exposed direct access to the websocket connection to users.

Thanks,

Bryan

Thank you for any help

–
You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/64890475-d18d-417c-9801-20cb7ce60028%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Havoc_Pennington · January 18, 2016, 5:53pm

1) When running myapp.py on server, what if I want to spin off new
document or new document root (=new figure)? Is there some API to do that
programatically? In my web app I want user to be able by selecting
parameters (e.g. dataset, analytic method) to spin off a new interactive
plot. (Such as the down-sampling example)

The short answer here is no, there isn't a server-side API right now to
make new sessions. If you wanted to add one, perhaps it would be reasonable
to put this in ServerContext.
The docs to get started are
http://bokeh.pydata.org/en/latest/docs/dev_guide/server.html

It could be as simple as

session_context = server_context.create_session(document)

I thought that using the client approach where one can programattically

create sessions, documents and figures would solve this. But then I had to
run loop_until_closed on my client to achieve responding to on_change()
callbacks.

If you didn't see it, see also
https://groups.google.com/a/continuum.io/d/msg/bokeh/2qo6Km_XUow/wihBDG3dBAAJ

2) Can application in bokeh server respond to asynchroneous call, e.g. I
would like to notify my app running on bokeh server that there is a new
data in the timeseries. Then the code in the bokeh app would fetch the data
from remote location and include it into plot. Or maybe I want to notify my
bokeh app that computation has finished on my analytics server and the new
results are ready to be loaded.

All you can do really is respond to changes in the document. So you'd have
to put something in the doc that changes in this case.

Havoc

···

On Fri, Jan 15, 2016 at 3:01 AM, Marcel Német <[email protected]> wrote: