My implementation of downsampling on server, How to avoid glitch when replacing data in graph?

Hi, I have implemented downsampling on server for large datasets. I tested with 800 000+ and 8 000 000 points, but should work for more if memory allows. It is not optimal - I started learning python,numpy and pandas last week and bokeh this week.

How it works: original dataset is a time series with 800 000 + points. Then downsampled versions of dataset are precomputed. Each downsampled version has 4 times lower sampling frequency than the one before. I use callback to detect start and end of x axis when user zooms and based on current zoom I load a version of dataset with appropriate sampling frequency to plot, such that there is always at maximum as many points as (width of plot in pixels) * 4.

I display the data not as line but as patch to be able to plot not mean but min and max. I think it is more informative than mean.

When replacing data there is a short glitch visible in the browser. Can this be avoided? E.g. repaint plot in a different way? Try the code and see for yourselves. Ideally I would first like to paint the desired new version of dataset, before removing the version of dataset that has unsatisfactiory samplig frequency. Is this possible?

I attach the code, it should be run as

bokeh serve --show demonstration_downsampling_2.py

bokeh 0.11.0

demonstration_downsampling_2.py (6.23 KB)

Marcel,

Truly amazing timing, I literally *just* hit send on a reply to your earlier email. Apologies for the delay, both the mailing list traffic and GH issues has seen a spike and it can be hard to keep up. But I see you have fared fairly well in any case! Let me try out your example and see if I can offer any specific suggestions. In any case, would you be interested in contributing a downsampling example as a PR to the project? It would be extremely appreciated if you have the ability to do that.

Thanks,

Bryan

···

On Jan 14, 2016, at 10:49 AM, Marcel Német <[email protected]> wrote:

Hi, I have implemented downsampling on server for large datasets. I tested with 800 000+ and 8 000 000 points, but should work for more if memory allows. It is not optimal - I started learning python,numpy and pandas last week and bokeh this week.

How it works: original dataset is a time series with 800 000 + points. Then downsampled versions of dataset are precomputed. Each downsampled version has 4 times lower sampling frequency than the one before. I use callback to detect start and end of x axis when user zooms and based on current zoom I load a version of dataset with appropriate sampling frequency to plot, such that there is always at maximum as many points as (width of plot in pixels) * 4.

I display the data not as line but as patch to be able to plot not mean but min and max. I think it is more informative than mean.

When replacing data there is a short glitch visible in the browser. Can this be avoided? E.g. repaint plot in a different way? Try the code and see for yourselves. Ideally I would first like to paint the desired new version of dataset, before removing the version of dataset that has unsatisfactiory samplig frequency. Is this possible?

I attach the code, it should be run as
bokeh serve --show demonstration_downsampling_2.py

bokeh 0.11.0

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/c3420397-2419-4188-bb35-b6f1a162fbad%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.
<demonstration_downsampling_2.py>

Marcel,

I have an immediate suggestion, can you try it out and see if it improves the flickering? I do not think the flickering is a problem with things being drawn too slowly, rather it looks like a problem with things being drawn "out of sync". Specifically, you are doing things like:

  l.data_source.data['x']= np.concatenate((indices,indices[::-1]))
  l.data_source.data['y']=np.concatenate((values,values[::-1]))

This generates two separate messages, each causing the plot to update. So for a brief instant, you are displaying new "x" values together with old "y" values. What I think would fix things for you is to "batch" the updates together, with something like:

  l.data_source.data = dict(
    x=<your new x data>,
    y=<your new y data>
  )

This will generate a single update message, that updates both x and y at the "same time". This is probably a point we ought to highlight and describe more explicitly.

Bryan

···

On Jan 14, 2016, at 11:02 AM, Bryan Van de Ven <[email protected]> wrote:

Marcel,

Truly amazing timing, I literally *just* hit send on a reply to your earlier email. Apologies for the delay, both the mailing list traffic and GH issues has seen a spike and it can be hard to keep up. But I see you have fared fairly well in any case! Let me try out your example and see if I can offer any specific suggestions. In any case, would you be interested in contributing a downsampling example as a PR to the project? It would be extremely appreciated if you have the ability to do that.

Thanks,

Bryan

On Jan 14, 2016, at 10:49 AM, Marcel Német <[email protected]> wrote:

Hi, I have implemented downsampling on server for large datasets. I tested with 800 000+ and 8 000 000 points, but should work for more if memory allows. It is not optimal - I started learning python,numpy and pandas last week and bokeh this week.

How it works: original dataset is a time series with 800 000 + points. Then downsampled versions of dataset are precomputed. Each downsampled version has 4 times lower sampling frequency than the one before. I use callback to detect start and end of x axis when user zooms and based on current zoom I load a version of dataset with appropriate sampling frequency to plot, such that there is always at maximum as many points as (width of plot in pixels) * 4.

I display the data not as line but as patch to be able to plot not mean but min and max. I think it is more informative than mean.

When replacing data there is a short glitch visible in the browser. Can this be avoided? E.g. repaint plot in a different way? Try the code and see for yourselves. Ideally I would first like to paint the desired new version of dataset, before removing the version of dataset that has unsatisfactiory samplig frequency. Is this possible?

I attach the code, it should be run as
bokeh serve --show demonstration_downsampling_2.py

bokeh 0.11.0

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/c3420397-2419-4188-bb35-b6f1a162fbad%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.
<demonstration_downsampling_2.py>

Thank you, that helped, I would probably not figure it out. Now I see that in your demo example you are doing it like that as well.

I think for PR demo a much simpler version of down-sampling would be better and a shorter code. This code is quite long and specific to my use-case. I think it would help if you just added to the documentation that down-sampling can be achieved by using on_change callbacks on x_range / y_range. In the first place I imagined that you already have some algorithm included in bokeh. I think in previous version of bokeh I found somewhere function line_downsample so I was looking for that function in 0.11.0.

Also with on_change callbacks it took me a while to understand that I can simply listen to change of any parameter. In the beginning I was trying to find a list of events such as ‘selected’ that can be listened to. After realizing that it can be done on any parameter, and finding the correct parameters to listen on, it was just the implementation.

Now I am listening to

p.x_range.on_change(‘start’,x_range_change_callback)

p.x_range.on_change(‘end’,x_range_change_callback)

so when user zooms I usually get 2 callbacks for each zoom. Start changes and then end changes. Is it possible to listen to something else, so I get only one callback?

Dne čtvrtek 14. ledna 2016 18:17:40 UTC+1 Bryan Van de ven napsal(a):

···

Marcel,

I have an immediate suggestion, can you try it out and see if it improves the flickering? I do not think the flickering is a problem with things being drawn too slowly, rather it looks like a problem with things being drawn “out of sync”. Specifically, you are doing things like:

    l.data_source.data['x']= np.concatenate((indices,indices[::-1]))
    l.data_source.data['y']=np.concatenate((values,values[::-1]))

This generates two separate messages, each causing the plot to update. So for a brief instant, you are displaying new “x” values together with old “y” values. What I think would fix things for you is to “batch” the updates together, with something like:

    l.data_source.data = dict(
            x=<your new x data>,
            y=<your new y data>
    )

This will generate a single update message, that updates both x and y at the “same time”. This is probably a point we ought to highlight and describe more explicitly.

Bryan

On Jan 14, 2016, at 11:02 AM, Bryan Van de Ven [email protected] wrote:

Marcel,

Truly amazing timing, I literally just hit send on a reply to your earlier email. Apologies for the delay, both the mailing list traffic and GH issues has seen a spike and it can be hard to keep up. But I see you have fared fairly well in any case! Let me try out your example and see if I can offer any specific suggestions. In any case, would you be interested in contributing a downsampling example as a PR to the project? It would be extremely appreciated if you have the ability to do that.

Thanks,

Bryan

On Jan 14, 2016, at 10:49 AM, Marcel Német [email protected] wrote:

Hi, I have implemented downsampling on server for large datasets. I tested with 800 000+ and 8 000 000 points, but should work for more if memory allows. It is not optimal - I started learning python,numpy and pandas last week and bokeh this week.

How it works: original dataset is a time series with 800 000 + points. Then downsampled versions of dataset are precomputed. Each downsampled version has 4 times lower sampling frequency than the one before. I use callback to detect start and end of x axis when user zooms and based on current zoom I load a version of dataset with appropriate sampling frequency to plot, such that there is always at maximum as many points as (width of plot in pixels) * 4.

I display the data not as line but as patch to be able to plot not mean but min and max. I think it is more informative than mean.

When replacing data there is a short glitch visible in the browser. Can this be avoided? E.g. repaint plot in a different way? Try the code and see for yourselves. Ideally I would first like to paint the desired new version of dataset, before removing the version of dataset that has unsatisfactiory samplig frequency. Is this possible?

I attach the code, it should be run as
bokeh serve --show demonstration_downsampling_2.py

bokeh 0.11.0


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/c3420397-2419-4188-bb35-b6f1a162fbad%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

<demonstration_downsampling_2.py>