Hi,
my name is Jan Girlich from Germany and I'm working in IT Security. I
studied IT, worked before as C++ developer and know my way around Python
as well.
I'm currently building on a visualization project with about 15 to 20
million data points, which need to be browseable by panning and zooming.
Trying to solve this is how I found bokeh.
The data is several time series, which should be displayed in a
x-y-graph with one axis being the time and the other axis representing
the series. So, for example x is the time and y is "series 1", "series
2" and so on. Every x-y-coordinate represents one data point between 0
and 255 represented by a rectangle colored in a shade of gray according
to its value. When hovering over a coordinate a box should show more
details.
Now, this is really slow already when simply dumping about 400.000 data
points into a ColumnDataSource, so I'd like to discuss my approach to
this problem:
My idea is to put all the data in a pandas DataFrame and depending on
the zoom level call a callback on the bokeh server, which then finds the
data points, which are within one pixel of the screen, and calculate a
mean gray value for this pixel. The goal is to keep the number of
elements to display low (and then maybe use WebGL to display them, so
panning is fast). Do you think this could work? How would you solve this?
Cheers
Jan