After some weeks of struggle and with lot of help from the Bokeh community we proudly announce the first demo of our prototype “DynamicSubSampling” DSS.
Stupid name - we will come up with something better.
The
aim of DSS is give the Bokeh programmers a technology to let their remote users navigate in a fairly large dataset smoothly while only transferring and showing the appropriate chunk and resolution of the data.
A demo says more as thousand words:
http://graphdemo.inqbus.de/diagramview/PBL_20160101.h5
Login: admin:password
There you can navigate in a large dataset:
Ceilometer-Data
of the planetary boundary layer heights between 1.1.2016 and 1.7.2016 in 15sec resolution with potentially three datapoints per timestamp.
There are 1Mio timestamps so in sum there is a data basis of roughly 3Mio. datapoints, residing in a HDF5-File on our VM-Server.
-
The Backend machinery is not optimized, yet. Each AJAX request hitting the VM results in a reopening of the HDF5-File - there is no caching, yet.
-
THe VM runs together with 5 others on a 7 year old host.
-
We have two WSGI-Threads utilizing gunicorn
To give you a perspective: -
If you load one day of this data utilizing the usual bokeh methods you will have to wait 20 seconds initial loading time. Afterwards you can manipulate the data quite quickly.
-
If you load one month of this data utilizing the usual bokeh methods you will blow up your browser (Out of Memory).
What does DSS do? Basically three things:
- Hooking intelligently
into Bokeh events. DSS reacts on changes of the axis of the plot, but only filtered. So if DSS sees a number consecutive Events in a small time window it only reacts on the last event. - Transfering metadata
to the server and receiving new data from the server for the Bokeh-Datasources. For this task we invented a new protocol based on HTML5-binary transport to send multidimension complex data as a single chunk of binary data that is de-marshalled on the client side into typed
JS Arrays that are going straight into Bokeh. That is FAST transfer. - Intelligently scaling and filtering the data. DSS shows you only what you are capable to see.
Example:
Your diagram has a visible resolution of 600 pixels for the X-Axis. And you have 600.000 datapoints in the chosen X-Interval. So you can only plot a reasonable number of 600
Datapoints without cluttering your display. DSS in this case filters the data utilizing a regriddign utilizing a average operator.
So in the example case shunks of 1000 Datapoints are averaged in X and in Y to form a new regridded datapoint. Also Error-Bars can be obtained in that fashion - but are not shown
in the demo.
DSS does work acceptable. It has some flaws:
- We have not take care on boundary effects. When zooming in the curve lost its connection to the outside of the viewport.
- IE does not work. Firefox and Chrome are working.
Problems with Bokeh. After some zooming and panning to plottgin area shifts to right-down leaving a growing gray streak top-left. No clue at all where this comes from.
After we improved the code to some maturity we will realease it as an open source extension to Bokeh. But it will be a long way to go.
We like to hear your comments and complaints.
Cheers,
Volker