image performance

Hi,

I am visualising large matrices using bokeh, however I cannot understand why it takes so long time to render a matrix using bokeh compared to if i do the same rendering using matplotlib.

Both renderings are done in the notebook web interface under the exact same conditions. The only difference is that while matplotlibs image function renders within 1 second, bokeh takes 17 seconds to render the same plot.

Is there anything I am doing wrong?

If this is a problem with the bokeh library I think its something which should be considered to be fixed since many people in scientific communities visualizes large colormapped matrices, and if this is not really possible in Bokeh this is a big drawback.

Below is the code for how to create a matrix visualization with default colormap in first bokeh and then matlab:

create testmatrix:

x = np.random.randn(1200,1200)

Create bokeh image

x_range = [0, x.shape[0]]

y_range = [0, x.shape[1]]

p = figure(x_range=y_range, y_range=x_range)

p.image(image=, x=0, y=0, dw=x.shape[1], dh=x.shape[0], palette=“Spectral11”)

show(p) # <— Renders in ~17 seconds

#Create matplotlib image

plt.imshow(x) # <— Renders in ~1 second

Thanks

There is an open issue:

  Imshow slow in notebooks · Issue #4487 · bokeh/bokeh · GitHub

While the user code is superficially similar, Bokeh and MPL could not be more different under the covers. MPL generates a static image, once, as binary data, and the browser displays the image as an image format (.png, etc, which browsers are ruthlessly optimized for). In order to afford all the interactive capabilities, Bokeh (the python library) creates an image, then serializes it (currently to JSON, part of the issue), sends it to a completely different runtime library in the browser (BokehJS, the javascript library) where it is deserialized and then rendered on HTML5 canvas.

So, this is a fairly classic trade-off. Exchanging a declarative JSON specification between runtimes enables many great and new features (and works perfectly well for 1D data column that drives most of Bokeh glyphs) but does pose problems for 2D arrays where the number of points increases as N**2. There are places where special case optimizations can probably be made to improve things for images, possibly substantially, but they will require changes at the lowest level of the librar(ies), so must be undertaken carefully and with an excess of testing. This is especially true in the notebook, where combining two large complicated pieces of software is a delicate operation, and where problems are often extremely difficult to debug.

I can't speculate on when this issue will be gotten to, there are many important issues to address (the issues tracker now exceeds 750), and only a handful of people currently doing low-level core development on a regular basis.

TLDR; Small images are often "ok". But if you need to use large images, e.g. 1200x1200 like you have, I'd have to recommend using MPL for the time being.

Thanks,

Bryan

···

On Oct 29, 2016, at 3:48 AM, Bokeh coder <[email protected]> wrote:

Hi,

I am visualising large matrices using bokeh, however I cannot understand why it takes so long time to render a matrix using bokeh compared to if i do the same rendering using matplotlib.

Both renderings are done in the notebook web interface under the exact same conditions. The only difference is that while matplotlibs image function renders within 1 second, bokeh takes 17 seconds to render the same plot.
Is there anything I am doing wrong?
If this is a problem with the bokeh library I think its something which should be considered to be fixed since many people in scientific communities visualizes large colormapped matrices, and if this is not really possible in Bokeh this is a big drawback.

Below is the code for how to create a matrix visualization with default colormap in first bokeh and then matlab:

# create testmatrix:
x = np.random.randn(1200,1200)

# Create bokeh image
x_range = [0, x.shape[0]]
y_range = [0, x.shape[1]]
p = figure(x_range=y_range, y_range=x_range)
p.image(image=, x=0, y=0, dw=x.shape[1], dh=x.shape[0], palette="Spectral11")
show(p) # <--- Renders in ~17 seconds

#Create matplotlib image
plt.imshow(x) # <--- Renders in ~1 second

Thanks

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d9c2a79a-982a-4595-a31f-b10104975966%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Hi Bryan,
Thank you for the in-depth description of what causes the problem.

I do see the challenge, and obviously render-once image as done by MPL is far from optimal - i did not know that this is what MPL does.

As a suggestion, maybe it would make sense to implement similar functionality in Bokeh - render once and then use that until a problem solution for this problem has been made.

Do you think this would make sense? I could possibly try to have a go at that if you could guide me a bit as to where to change things and what to look out for.

Another thought i had was if it would be possible to use datashader or holoviews to visualize large images like 1200x1200 or larger?

Maybe some sort of automated tiling could be done when calling the image function.

Are any such thought being considered?

Best regards

···

On Sat, Oct 29, 2016 at 3:38 PM, Bryan Van de Ven [email protected] wrote:

There is an open issue:

    [https://github.com/bokeh/bokeh/issues/4487](https://github.com/bokeh/bokeh/issues/4487)

While the user code is superficially similar, Bokeh and MPL could not be more different under the covers. MPL generates a static image, once, as binary data, and the browser displays the image as an image format (.png, etc, which browsers are ruthlessly optimized for). In order to afford all the interactive capabilities, Bokeh (the python library) creates an image, then serializes it (currently to JSON, part of the issue), sends it to a completely different runtime library in the browser (BokehJS, the javascript library) where it is deserialized and then rendered on HTML5 canvas.

So, this is a fairly classic trade-off. Exchanging a declarative JSON specification between runtimes enables many great and new features (and works perfectly well for 1D data column that drives most of Bokeh glyphs) but does pose problems for 2D arrays where the number of points increases as N**2. There are places where special case optimizations can probably be made to improve things for images, possibly substantially, but they will require changes at the lowest level of the librar(ies), so must be undertaken carefully and with an excess of testing. This is especially true in the notebook, where combining two large complicated pieces of software is a delicate operation, and where problems are often extremely difficult to debug.

I can’t speculate on when this issue will be gotten to, there are many important issues to address (the issues tracker now exceeds 750), and only a handful of people currently doing low-level core development on a regular basis.

TLDR; Small images are often “ok”. But if you need to use large images, e.g. 1200x1200 like you have, I’d have to recommend using MPL for the time being.

Thanks,

Bryan

On Oct 29, 2016, at 3:48 AM, Bokeh coder [email protected] wrote:

Hi,

I am visualising large matrices using bokeh, however I cannot understand why it takes so long time to render a matrix using bokeh compared to if i do the same rendering using matplotlib.

Both renderings are done in the notebook web interface under the exact same conditions. The only difference is that while matplotlibs image function renders within 1 second, bokeh takes 17 seconds to render the same plot.

Is there anything I am doing wrong?

If this is a problem with the bokeh library I think its something which should be considered to be fixed since many people in scientific communities visualizes large colormapped matrices, and if this is not really possible in Bokeh this is a big drawback.

Below is the code for how to create a matrix visualization with default colormap in first bokeh and then matlab:

create testmatrix:

x = np.random.randn(1200,1200)

Create bokeh image

x_range = [0, x.shape[0]]

y_range = [0, x.shape[1]]

p = figure(x_range=y_range, y_range=x_range)

p.image(image=, x=0, y=0, dw=x.shape[1], dh=x.shape[0], palette=“Spectral11”)

show(p) # <— Renders in ~17 seconds

#Create matplotlib image

plt.imshow(x) # <— Renders in ~1 second

Thanks

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d9c2a79a-982a-4595-a31f-b10104975966%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/DC0E0008-11C9-4B70-BDAC-11235E6303E4%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi,

I was perhaps not as clear as I could have been, in discussing the "render once" aspect. The problem isn't in generating a 1200x1200 RGBA image as a NumPy array, it's in the serialization, communication, and deserialization overhead. But communicating to BokehJS via a declarative protocol is one of the most fundamental design principles of Bokeh, and is built into every part of the library. It's what enables other language bindings (like RBokeh, bokeh-scala, Bokeh.Lua, and hopefully others) to exist. It's also what let's the bokeh server connect interactive visualizations and data applications in the browser directly to real python code running all the pydata and scipy stack libraries. Those are by far the most important goals of the library. Any solution for this one special case has to be compatible with them, which means finding ways to improve the serialization, communication, and deserialization in the special case of 2D data (which is not used in very many places in Bokeh, and why it currently suffers).

I certainly would not turn down any help in this task but I should be up front and state that it is not a "starter" or "beginner" task. Improving this will require all of: python, coffeescript, notebook internals, binary encodings, websockets, and extensive test writing (both python and javascript). If you are interested I am happy to have a call to start discussing details and possible design avenues.

As an aside, Bokeh already supports rendering tiles from any tile server. If you can pre-cook your data into image tiles (e.g. .png, .jpg) and serve them, that might be useful in the short term. There is nothing automated though, that is not a small undertaking, and there are simply not the human resources available (and it would be duplicative of other existing tile-server projects).

Datashader just generates images. Datashader is helpful when you literally have 100 million or a billion points, because image creation time dominates, and because theres no way to send a billion points into the browser (even if you could without crashing the browser, it would take forever). Once it makes an image, Datashader is a downstream client of the Bokeh image glyph, any image it generates has to go through the same machinery, so if you are still talking about final images of the same size, I don't think it would offer much help here. If you mean to use Datashader to generate smaller images (say 600x600 ?) that could be zoomed in and out of efficiently, that could possibly be a reasonable avenue to pursue.

Thanks,

Bryan

···

On Oct 29, 2016, at 12:25 PM, Michael Hansen <[email protected]> wrote:

Hi Bryan,
Thank you for the in-depth description of what causes the problem.
I do see the challenge, and obviously render-once image as done by MPL is far from optimal - i did not know that this is what MPL does.
As a suggestion, maybe it would make sense to implement similar functionality in Bokeh - render once and then use that until a problem solution for this problem has been made.
Do you think this would make sense? I could possibly try to have a go at that if you could guide me a bit as to where to change things and what to look out for.

Another thought i had was if it would be possible to use datashader or holoviews to visualize large images like 1200x1200 or larger?

Maybe some sort of automated tiling could be done when calling the image function.
Are any such thought being considered?

Best regards

On Sat, Oct 29, 2016 at 3:38 PM, Bryan Van de Ven <[email protected]> wrote:
There is an open issue:

        Imshow slow in notebooks · Issue #4487 · bokeh/bokeh · GitHub

While the user code is superficially similar, Bokeh and MPL could not be more different under the covers. MPL generates a static image, once, as binary data, and the browser displays the image as an image format (.png, etc, which browsers are ruthlessly optimized for). In order to afford all the interactive capabilities, Bokeh (the python library) creates an image, then serializes it (currently to JSON, part of the issue), sends it to a completely different runtime library in the browser (BokehJS, the javascript library) where it is deserialized and then rendered on HTML5 canvas.

So, this is a fairly classic trade-off. Exchanging a declarative JSON specification between runtimes enables many great and new features (and works perfectly well for 1D data column that drives most of Bokeh glyphs) but does pose problems for 2D arrays where the number of points increases as N**2. There are places where special case optimizations can probably be made to improve things for images, possibly substantially, but they will require changes at the lowest level of the librar(ies), so must be undertaken carefully and with an excess of testing. This is especially true in the notebook, where combining two large complicated pieces of software is a delicate operation, and where problems are often extremely difficult to debug.

I can't speculate on when this issue will be gotten to, there are many important issues to address (the issues tracker now exceeds 750), and only a handful of people currently doing low-level core development on a regular basis.

TLDR; Small images are often "ok". But if you need to use large images, e.g. 1200x1200 like you have, I'd have to recommend using MPL for the time being.

Thanks,

Bryan

> On Oct 29, 2016, at 3:48 AM, Bokeh coder <[email protected]> wrote:
>
> Hi,
>
> I am visualising large matrices using bokeh, however I cannot understand why it takes so long time to render a matrix using bokeh compared to if i do the same rendering using matplotlib.
>
> Both renderings are done in the notebook web interface under the exact same conditions. The only difference is that while matplotlibs image function renders within 1 second, bokeh takes 17 seconds to render the same plot.
> Is there anything I am doing wrong?
> If this is a problem with the bokeh library I think its something which should be considered to be fixed since many people in scientific communities visualizes large colormapped matrices, and if this is not really possible in Bokeh this is a big drawback.
>
> Below is the code for how to create a matrix visualization with default colormap in first bokeh and then matlab:
>
> # create testmatrix:
> x = np.random.randn(1200,1200)
>
> # Create bokeh image
> x_range = [0, x.shape[0]]
> y_range = [0, x.shape[1]]
> p = figure(x_range=y_range, y_range=x_range)
> p.image(image=, x=0, y=0, dw=x.shape[1], dh=x.shape[0], palette="Spectral11")
> show(p) # <--- Renders in ~17 seconds
>
> #Create matplotlib image
> plt.imshow(x) # <--- Renders in ~1 second
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d9c2a79a-982a-4595-a31f-b10104975966%40continuum.io\.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/DC0E0008-11C9-4B70-BDAC-11235E6303E4%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAPty9BN_8pTwp_dtdHKvzRF%3DXCgYeHa-VRXpYanhJgEz0baTCg%40mail.gmail.com\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Michael,

If you just want to dump out a static image with no axes and don’t need to zoom into it, datashader will definitely be very fast:

import datashader.transfer_functions as tf

from xarray import DataArray

tf.shade(DataArray(x))

(should take microseconds if you’re not displaying it, and still well under a second to display the result in a Jupyter notebook cell). Plus, you’ll get the full, native resolution of your image, whereas the default Matplotlib image is massively downsampling your image (one more reason Matplotlib is so fast) and the default Bokeh image will slightly downsample it.

But you won’t get any axes, colorbars, or legends, for which you need to pass either the datashaded image or the original array to Bokeh or Matplotlib, leaving you back where you started.

As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.

Using HoloViews won’t generally help you get around issues like this – if you use HoloViews’ Matplotlib backend, it should have about the same speed as regular Matplotlib, and if you use HoloViews’ Bokeh backend, then all the data will get sent to the browser, slowing things down as always. But at least HoloViews makes it very simple to switch between the two backends as needed, so that you could use Bokeh for most plots and switch to mpl for the few that slow things down.

Actually, one could probably modify HoloViews to add some new dynamic features to its Matplotlib backend support, writing some JavaScript to capture mouse events to support limited zooming, with zooming causing a new Matplotlib image to be rendered and displayed each time. That way you could have the speed of Matplotlib and some of the interactivity (though not all) of Bokeh, in some cases. But that would definitely be a job for someone with more JavaScript knowledge than I have, and would only work in some fairly limited cases, so it’s not something we’d be likely to pursue.

···

On Sat, Oct 29, 2016 at 6:25 PM, Michael Hansen [email protected] wrote:

Hi Bryan,
Thank you for the in-depth description of what causes the problem.

I do see the challenge, and obviously render-once image as done by MPL is far from optimal - i did not know that this is what MPL does.

As a suggestion, maybe it would make sense to implement similar functionality in Bokeh - render once and then use that until a problem solution for this problem has been made.

Do you think this would make sense? I could possibly try to have a go at that if you could guide me a bit as to where to change things and what to look out for.

Another thought i had was if it would be possible to use datashader or holoviews to visualize large images like 1200x1200 or larger?

Maybe some sort of automated tiling could be done when calling the image function.

Are any such thought being considered?

Best regards

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAPty9BN_8pTwp_dtdHKvzRF%3DXCgYeHa-VRXpYanhJgEz0baTCg%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Jim

On Sat, Oct 29, 2016 at 3:38 PM, Bryan Van de Ven [email protected] wrote:

There is an open issue:

    [https://github.com/bokeh/bokeh/issues/4487](https://github.com/bokeh/bokeh/issues/4487)

While the user code is superficially similar, Bokeh and MPL could not be more different under the covers. MPL generates a static image, once, as binary data, and the browser displays the image as an image format (.png, etc, which browsers are ruthlessly optimized for). In order to afford all the interactive capabilities, Bokeh (the python library) creates an image, then serializes it (currently to JSON, part of the issue), sends it to a completely different runtime library in the browser (BokehJS, the javascript library) where it is deserialized and then rendered on HTML5 canvas.

So, this is a fairly classic trade-off. Exchanging a declarative JSON specification between runtimes enables many great and new features (and works perfectly well for 1D data column that drives most of Bokeh glyphs) but does pose problems for 2D arrays where the number of points increases as N**2. There are places where special case optimizations can probably be made to improve things for images, possibly substantially, but they will require changes at the lowest level of the librar(ies), so must be undertaken carefully and with an excess of testing. This is especially true in the notebook, where combining two large complicated pieces of software is a delicate operation, and where problems are often extremely difficult to debug.

I can’t speculate on when this issue will be gotten to, there are many important issues to address (the issues tracker now exceeds 750), and only a handful of people currently doing low-level core development on a regular basis.

TLDR; Small images are often “ok”. But if you need to use large images, e.g. 1200x1200 like you have, I’d have to recommend using MPL for the time being.

Thanks,

Bryan

On Oct 29, 2016, at 3:48 AM, Bokeh coder [email protected] wrote:

Hi,

I am visualising large matrices using bokeh, however I cannot understand why it takes so long time to render a matrix using bokeh compared to if i do the same rendering using matplotlib.

Both renderings are done in the notebook web interface under the exact same conditions. The only difference is that while matplotlibs image function renders within 1 second, bokeh takes 17 seconds to render the same plot.

Is there anything I am doing wrong?

If this is a problem with the bokeh library I think its something which should be considered to be fixed since many people in scientific communities visualizes large colormapped matrices, and if this is not really possible in Bokeh this is a big drawback.

Below is the code for how to create a matrix visualization with default colormap in first bokeh and then matlab:

create testmatrix:

x = np.random.randn(1200,1200)

Create bokeh image

x_range = [0, x.shape[0]]

y_range = [0, x.shape[1]]

p = figure(x_range=y_range, y_range=x_range)

p.image(image=, x=0, y=0, dw=x.shape[1], dh=x.shape[0], palette=“Spectral11”)

show(p) # <— Renders in ~17 seconds

#Create matplotlib image

plt.imshow(x) # <— Renders in ~1 second

Thanks

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d9c2a79a-982a-4595-a31f-b10104975966%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/DC0E0008-11C9-4B70-BDAC-11235E6303E4%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi Jim and Bryan,
Thank you for the comments. I am beginning to understand the difficulties involved in this.

I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL’s 1 second…however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.

The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.

One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

“As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.”

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards

Michael

···

On Sat, Oct 29, 2016 at 11:18 PM, James Bednar [email protected] wrote:

Michael,

If you just want to dump out a static image with no axes and don’t need to zoom into it, datashader will definitely be very fast:

import datashader.transfer_functions as tf

from xarray import DataArray

tf.shade(DataArray(x))

(should take microseconds if you’re not displaying it, and still well under a second to display the result in a Jupyter notebook cell). Plus, you’ll get the full, native resolution of your image, whereas the default Matplotlib image is massively downsampling your image (one more reason Matplotlib is so fast) and the default Bokeh image will slightly downsample it.

But you won’t get any axes, colorbars, or legends, for which you need to pass either the datashaded image or the original array to Bokeh or Matplotlib, leaving you back where you started.

As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.

Using HoloViews won’t generally help you get around issues like this – if you use HoloViews’ Matplotlib backend, it should have about the same speed as regular Matplotlib, and if you use HoloViews’ Bokeh backend, then all the data will get sent to the browser, slowing things down as always. But at least HoloViews makes it very simple to switch between the two backends as needed, so that you could use Bokeh for most plots and switch to mpl for the few that slow things down.

Actually, one could probably modify HoloViews to add some new dynamic features to its Matplotlib backend support, writing some JavaScript to capture mouse events to support limited zooming, with zooming causing a new Matplotlib image to be rendered and displayed each time. That way you could have the speed of Matplotlib and some of the interactivity (though not all) of Bokeh, in some cases. But that would definitely be a job for someone with more JavaScript knowledge than I have, and would only work in some fairly limited cases, so it’s not something we’d be likely to pursue.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAMxakRdserUArqQ5yRV7U2PVQjVAS24RPVg990x%2BypOwoY5bag%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Jim

On Sat, Oct 29, 2016 at 6:25 PM, Michael Hansen [email protected] wrote:

Hi Bryan,
Thank you for the in-depth description of what causes the problem.

I do see the challenge, and obviously render-once image as done by MPL is far from optimal - i did not know that this is what MPL does.

As a suggestion, maybe it would make sense to implement similar functionality in Bokeh - render once and then use that until a problem solution for this problem has been made.

Do you think this would make sense? I could possibly try to have a go at that if you could guide me a bit as to where to change things and what to look out for.

Another thought i had was if it would be possible to use datashader or holoviews to visualize large images like 1200x1200 or larger?

Maybe some sort of automated tiling could be done when calling the image function.

Are any such thought being considered?

Best regards

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAPty9BN_8pTwp_dtdHKvzRF%3DXCgYeHa-VRXpYanhJgEz0baTCg%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

On Sat, Oct 29, 2016 at 3:38 PM, Bryan Van de Ven [email protected] wrote:

There is an open issue:

    [https://github.com/bokeh/bokeh/issues/4487](https://github.com/bokeh/bokeh/issues/4487)

While the user code is superficially similar, Bokeh and MPL could not be more different under the covers. MPL generates a static image, once, as binary data, and the browser displays the image as an image format (.png, etc, which browsers are ruthlessly optimized for). In order to afford all the interactive capabilities, Bokeh (the python library) creates an image, then serializes it (currently to JSON, part of the issue), sends it to a completely different runtime library in the browser (BokehJS, the javascript library) where it is deserialized and then rendered on HTML5 canvas.

So, this is a fairly classic trade-off. Exchanging a declarative JSON specification between runtimes enables many great and new features (and works perfectly well for 1D data column that drives most of Bokeh glyphs) but does pose problems for 2D arrays where the number of points increases as N**2. There are places where special case optimizations can probably be made to improve things for images, possibly substantially, but they will require changes at the lowest level of the librar(ies), so must be undertaken carefully and with an excess of testing. This is especially true in the notebook, where combining two large complicated pieces of software is a delicate operation, and where problems are often extremely difficult to debug.

I can’t speculate on when this issue will be gotten to, there are many important issues to address (the issues tracker now exceeds 750), and only a handful of people currently doing low-level core development on a regular basis.

TLDR; Small images are often “ok”. But if you need to use large images, e.g. 1200x1200 like you have, I’d have to recommend using MPL for the time being.

Thanks,

Bryan

On Oct 29, 2016, at 3:48 AM, Bokeh coder [email protected] wrote:

Hi,

I am visualising large matrices using bokeh, however I cannot understand why it takes so long time to render a matrix using bokeh compared to if i do the same rendering using matplotlib.

Both renderings are done in the notebook web interface under the exact same conditions. The only difference is that while matplotlibs image function renders within 1 second, bokeh takes 17 seconds to render the same plot.

Is there anything I am doing wrong?

If this is a problem with the bokeh library I think its something which should be considered to be fixed since many people in scientific communities visualizes large colormapped matrices, and if this is not really possible in Bokeh this is a big drawback.

Below is the code for how to create a matrix visualization with default colormap in first bokeh and then matlab:

create testmatrix:

x = np.random.randn(1200,1200)

Create bokeh image

x_range = [0, x.shape[0]]

y_range = [0, x.shape[1]]

p = figure(x_range=y_range, y_range=x_range)

p.image(image=, x=0, y=0, dw=x.shape[1], dh=x.shape[0], palette=“Spectral11”)

show(p) # <— Renders in ~17 seconds

#Create matplotlib image

plt.imshow(x) # <— Renders in ~1 second

Thanks

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d9c2a79a-982a-4595-a31f-b10104975966%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/DC0E0008-11C9-4B70-BDAC-11235E6303E4%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Michael,

Datashader’s actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis

tf.shade(DataArray(x),cmap=viridis,how=‘linear’)

from bokeh.palettes import Spectral11

tf.shade(DataArray(x),cmap=Spectral11,how=‘log’)

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at https://anaconda.org/jbednar/notebooks. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

···

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen [email protected] wrote:

Hi Jim and Bryan,
Thank you for the comments. I am beginning to understand the difficulties involved in this.

I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL’s 1 second…however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.

The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.

One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

“As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.”

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards

Michael

Jim

Hey,

Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.

I’m not sure where Bokeh’s colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer’s (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.

Regards,
Rutger

···

On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:

Michael,

Datashader’s actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis

tf.shade(DataArray(x),cmap=viridis,how=‘linear’)

from bokeh.palettes import Spectral11

tf.shade(DataArray(x),cmap=Spectral11,how=‘log’)

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at https://anaconda.org/jbednar/notebooks. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

Jim

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen [email protected] wrote:

Hi Jim and Bryan,
Thank you for the comments. I am beginning to understand the difficulties involved in this.

I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL’s 1 second…however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.

The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.

One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

“As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.”

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards

Michael

I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as "lists of lists" in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is *very* slow. So I'm not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.

The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.

Thanks,

Bryan

···

On Oct 31, 2016, at 4:06 AM, Rutger Kassies <[email protected]> wrote:

Hey,

Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.

I'm not sure where Bokeh's colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer's (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.

Regards,
Rutger

On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
Michael,

Datashader's actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis
tf.shade(DataArray(x),cmap=viridis,how='linear')

from bokeh.palettes import Spectral11
tf.shade(DataArray(x),cmap=Spectral11,how='log')

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at Login :: Anaconda.org. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

Jim

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen <[email protected]> wrote:
Hi Jim and Bryan,
Thank you for the comments. I am beginning to understand the difficulties involved in this.
I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL's 1 second....however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

"As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time."

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards
Michael

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Do folks here have the ability to build and test Bokeh locally? If so I can push a PR for discussion that might make some improvements. Assuming it proves some benefit we can figure out out how make it robust and validated enough to get included as a start.

Bryan

···

On Oct 31, 2016, at 10:26 AM, Bryan Van de Ven <[email protected]> wrote:

I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as "lists of lists" in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is *very* slow. So I'm not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.

The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.

Thanks,

Bryan

On Oct 31, 2016, at 4:06 AM, Rutger Kassies <[email protected]> wrote:

Hey,

Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.

I'm not sure where Bokeh's colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer's (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.

Regards,
Rutger

On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
Michael,

Datashader's actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis
tf.shade(DataArray(x),cmap=viridis,how='linear')

from bokeh.palettes import Spectral11
tf.shade(DataArray(x),cmap=Spectral11,how='log')

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at Login :: Anaconda.org. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

Jim

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen <[email protected]> wrote:
Hi Jim and Bryan,
Thank you for the comments. I am beginning to understand the difficulties involved in this.
I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL's 1 second....however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

"As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time."

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards
Michael

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

If there are improvements to 2d-array/image handling performance, I can build and test the changes with datashader to see if things improve. Datashader is directly limited by Bokeh’s 2D capabilities, so it’s of very much interest to us!

···

On Mon, Oct 31, 2016 at 11:34 PM, Bryan Van de Ven [email protected] wrote:

Do folks here have the ability to build and test Bokeh locally? If so I can push a PR for discussion that might make some improvements. Assuming it proves some benefit we can figure out out how make it robust and validated enough to get included as a start.

Bryan

On Oct 31, 2016, at 10:26 AM, Bryan Van de Ven [email protected] wrote:

I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as “lists of lists” in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is very slow. So I’m not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.

The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.

Thanks,

Bryan

On Oct 31, 2016, at 4:06 AM, Rutger Kassies [email protected] wrote:

Hey,

Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.

I’m not sure where Bokeh’s colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer’s (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.

Regards,

Rutger

On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:

Michael,

Datashader’s actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis

tf.shade(DataArray(x),cmap=viridis,how=‘linear’)

from bokeh.palettes import Spectral11

tf.shade(DataArray(x),cmap=Spectral11,how=‘log’)

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at https://anaconda.org/jbednar/notebooks. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

Jim

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen [email protected] wrote:

Hi Jim and Bryan,

Thank you for the comments. I am beginning to understand the difficulties involved in this.

I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL’s 1 second…however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.

The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.

One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

“As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.”

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards

Michael

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/3FF2A104-3D53-42FC-9B20-18DE54CC2FCC%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Jim

OK, there's a PR at:

  experimental junk by bryevdv · Pull Request #5429 · bokeh/bokeh · GitHub

This is by no means something that can be merged but it might show whether something along this lines could be a quicker immediate-term improvement.

Thanks,

Bryan

···

On Oct 31, 2016, at 6:39 PM, James Bednar <[email protected]> wrote:

If there are improvements to 2d-array/image handling performance, I can build and test the changes with datashader to see if things improve. Datashader is directly limited by Bokeh's 2D capabilities, so it's of very much interest to us!

Jim

On Mon, Oct 31, 2016 at 11:34 PM, Bryan Van de Ven <[email protected]> wrote:
Do folks here have the ability to build and test Bokeh locally? If so I can push a PR for discussion that might make some improvements. Assuming it proves some benefit we can figure out out how make it robust and validated enough to get included as a start.

Bryan

> On Oct 31, 2016, at 10:26 AM, Bryan Van de Ven <[email protected]> wrote:
>
> I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as "lists of lists" in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is *very* slow. So I'm not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.
>
> The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.
>
> Thanks,
>
> Bryan
>
>
>> On Oct 31, 2016, at 4:06 AM, Rutger Kassies <[email protected]> wrote:
>>
>> Hey,
>>
>> Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.
>>
>> I'm not sure where Bokeh's colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer's (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.
>>
>> Regards,
>> Rutger
>>
>>
>> On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
>> Michael,
>>
>> Datashader's actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:
>>
>> from matplotlib.cm import viridis
>> tf.shade(DataArray(x),cmap=viridis,how='linear')
>>
>> from bokeh.palettes import Spectral11
>> tf.shade(DataArray(x),cmap=Spectral11,how='log')
>>
>> See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at Login :: Anaconda.org. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).
>>
>> Jim
>>
>> On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen <[email protected]> wrote:
>> Hi Jim and Bryan,
>> Thank you for the comments. I am beginning to understand the difficulties involved in this.
>> I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL's 1 second....however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
>> The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
>> One question is then, what default colormap does datashader use and what range? And can i change that colormap?
>>
>> I got triggered by your comment Jim:
>>
>> "As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time."
>>
>> Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?
>>
>> If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?
>>
>> Thanks a lot and best regards
>> Michael
>>
>> --
>> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>> To post to this group, send email to [email protected].
>> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io\.
>> For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
>

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/3FF2A104-3D53-42FC-9B20-18DE54CC2FCC%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAMxakRdFAnyzmFa79NeoVL0f41d6K7vbGMf4vOh2EayYEbg5Hw%40mail.gmail.com\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Bryan, I seem to remember there was a plan for binary (rather than JSON) serialisation/deserialisation which I’m assuming would make a significant difference - is that still planned for a future upgrade?

···

On Monday, October 31, 2016 at 3:26:28 PM UTC, Bryan Van de ven wrote:

I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as “lists of lists” in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is very slow. So I’m not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.

The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.

Thanks,

Bryan

On Oct 31, 2016, at 4:06 AM, Rutger Kassies [email protected] wrote:

Hey,

Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.

I’m not sure where Bokeh’s colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer’s (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.

Regards,

Rutger

On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:

Michael,

Datashader’s actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis

tf.shade(DataArray(x),cmap=viridis,how=‘linear’)

from bokeh.palettes import Spectral11

tf.shade(DataArray(x),cmap=Spectral11,how=‘log’)

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at https://anaconda.org/jbednar/notebooks. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

Jim

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen [email protected] wrote:

Hi Jim and Bryan,

Thank you for the comments. I am beginning to understand the difficulties involved in this.

I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL’s 1 second…however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.

One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

“As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.”

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards

Michael


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi,

Yes that's still the plan but that is also more involved and more risky. It's also not clear to me that it will have any benefit outside Bokeh server apps. In the case of apps, the low level protocol was already built to support multi-part messages. So NumPy arrays could be sent without any translation, or even any copying, directly across the wire into browser typed array buffers (we'll just stipulate little-endian order). This is surely the best that can be done, in principle. But the option is only available because the data is sent over a websocket. Still, it's definitely on my list of things we need to get to.

In the case of non-server apps, the data has to be encoded as text somehow. There is simply no getting around this in any way, since it's part of the HTML page or in a sidecar .js (text) file that gets loaded. Early experiments seemed to indicated that average behavior of binary encodings (e.g. base64 or others) could actually *inflate* the data size to transmit. But that still might be a win, depending on the cost of the encoding, etc. But it's going to require lots of very detailed analysis that is also going to be difficult to carry out. If there are experts in these areas I would absolutely welcome any and all help.

There's also potentially a possibility in the case of the notebook to use the notebook comms websocket to send data. So great! We could make in improvement for notebooks at least, right? Except notebooks comms require a running kernel, so going a comms-only route means that static notebook plots would not function at all. Is that a reasonable thing? Probably not. Maybe some kind of hybrid approach could be made, but then that adds complexity, and these parts of the library around the notebook are already hard to maintain. Unfortunately the space of use-cases is large, and there is a minefield of trade-offs.

Thanks,

Bryan

···

On Nov 1, 2016, at 2:45 AM, Marcus Donnelly <[email protected]> wrote:

Bryan, I seem to remember there was a plan for binary (rather than JSON) serialisation/deserialisation which I'm assuming would make a significant difference - is that still planned for a future upgrade?

On Monday, October 31, 2016 at 3:26:28 PM UTC, Bryan Van de ven wrote:
I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as "lists of lists" in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is *very* slow. So I'm not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.

The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.

Thanks,

Bryan

> On Oct 31, 2016, at 4:06 AM, Rutger Kassies <[email protected]> wrote:
>
> Hey,
>
> Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.
>
> I'm not sure where Bokeh's colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer's (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.
>
> Regards,
> Rutger
>
>
> On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
> Michael,
>
> Datashader's actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:
>
> from matplotlib.cm import viridis
> tf.shade(DataArray(x),cmap=viridis,how='linear')
>
> from bokeh.palettes import Spectral11
> tf.shade(DataArray(x),cmap=Spectral11,how='log')
>
> See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at Login :: Anaconda.org. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).
>
> Jim
>
> On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen <[email protected]> wrote:
> Hi Jim and Bryan,
> Thank you for the comments. I am beginning to understand the difficulties involved in this.
> I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL's 1 second....however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
> The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
> One question is then, what default colormap does datashader use and what range? And can i change that colormap?
>
> I got triggered by your comment Jim:
>
> "As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time."
>
> Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?
>
> If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?
>
> Thanks a lot and best regards
> Michael
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bokeh+un...@continuum.io.
> To post to this group, send email to bo...@continuum.io.
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io\.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/df9b3e70-2860-4048-a788-f04eccb52cad%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Thanks for the info. If I understand correctly, it would be relatively (!) straightforward to do it for server apps due to the use of websockets and harder for non-server apps. This might simply mean that those interested in interacting with large volumes of data might be better off writing a server app once the binary option is there. The point being that the server opens up many interaction opportunities (e.g. streaming data, sliders to move through large multi-dimensional datasets etc), so if that’s what you need you might be better off using it. For plotting (e.g. heatmap) large volumes of data where interaction isn’t important there are other (e.g. static image) possibilities as previously discussed.

···

On Tuesday, November 1, 2016 at 2:05:12 PM UTC, Bryan Van de ven wrote:

Hi,

Yes that’s still the plan but that is also more involved and more risky. It’s also not clear to me that it will have any benefit outside Bokeh server apps. In the case of apps, the low level protocol was already built to support multi-part messages. So NumPy arrays could be sent without any translation, or even any copying, directly across the wire into browser typed array buffers (we’ll just stipulate little-endian order). This is surely the best that can be done, in principle. But the option is only available because the data is sent over a websocket. Still, it’s definitely on my list of things we need to get to.

In the case of non-server apps, the data has to be encoded as text somehow. There is simply no getting around this in any way, since it’s part of the HTML page or in a sidecar .js (text) file that gets loaded. Early experiments seemed to indicated that average behavior of binary encodings (e.g. base64 or others) could actually inflate the data size to transmit. But that still might be a win, depending on the cost of the encoding, etc. But it’s going to require lots of very detailed analysis that is also going to be difficult to carry out. If there are experts in these areas I would absolutely welcome any and all help.

There’s also potentially a possibility in the case of the notebook to use the notebook comms websocket to send data. So great! We could make in improvement for notebooks at least, right? Except notebooks comms require a running kernel, so going a comms-only route means that static notebook plots would not function at all. Is that a reasonable thing? Probably not. Maybe some kind of hybrid approach could be made, but then that adds complexity, and these parts of the library around the notebook are already hard to maintain. Unfortunately the space of use-cases is large, and there is a minefield of trade-offs.

Thanks,

Bryan

On Nov 1, 2016, at 2:45 AM, Marcus Donnelly [email protected] wrote:

Bryan, I seem to remember there was a plan for binary (rather than JSON) serialisation/deserialisation which I’m assuming would make a significant difference - is that still planned for a future upgrade?

On Monday, October 31, 2016 at 3:26:28 PM UTC, Bryan Van de ven wrote:

I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as “lists of lists” in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is very slow. So I’m not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.

The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.

Thanks,

Bryan

On Oct 31, 2016, at 4:06 AM, Rutger Kassies [email protected] wrote:

Hey,

Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.

I’m not sure where Bokeh’s colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer’s (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.

Regards,
Rutger

On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
Michael,

Datashader’s actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis
tf.shade(DataArray(x),cmap=viridis,how=‘linear’)

from bokeh.palettes import Spectral11
tf.shade(DataArray(x),cmap=Spectral11,how=‘log’)

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at https://anaconda.org/jbednar/notebooks. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

Jim

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen [email protected] wrote:
Hi Jim and Bryan,
Thank you for the comments. I am beginning to understand the difficulties involved in this.
I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL’s 1 second…however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

“As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.”

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards
Michael


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/df9b3e70-2860-4048-a788-f04eccb52cad%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hello, looks like some work was put into this issue. Has there been some improvement in newer versions of bokeh/datashader for this?

···

On Tue, Nov 1, 2016 at 3:34 PM, Marcus Donnelly [email protected] wrote:

Thanks for the info. If I understand correctly, it would be relatively (!) straightforward to do it for server apps due to the use of websockets and harder for non-server apps. This might simply mean that those interested in interacting with large volumes of data might be better off writing a server app once the binary option is there. The point being that the server opens up many interaction opportunities (e.g. streaming data, sliders to move through large multi-dimensional datasets etc), so if that’s what you need you might be better off using it. For plotting (e.g. heatmap) large volumes of data where interaction isn’t important there are other (e.g. static image) possibilities as previously discussed.

On Tuesday, November 1, 2016 at 2:05:12 PM UTC, Bryan Van de ven wrote:

Hi,

Yes that’s still the plan but that is also more involved and more risky. It’s also not clear to me that it will have any benefit outside Bokeh server apps. In the case of apps, the low level protocol was already built to support multi-part messages. So NumPy arrays could be sent without any translation, or even any copying, directly across the wire into browser typed array buffers (we’ll just stipulate little-endian order). This is surely the best that can be done, in principle. But the option is only available because the data is sent over a websocket. Still, it’s definitely on my list of things we need to get to.

In the case of non-server apps, the data has to be encoded as text somehow. There is simply no getting around this in any way, since it’s part of the HTML page or in a sidecar .js (text) file that gets loaded. Early experiments seemed to indicated that average behavior of binary encodings (e.g. base64 or others) could actually inflate the data size to transmit. But that still might be a win, depending on the cost of the encoding, etc. But it’s going to require lots of very detailed analysis that is also going to be difficult to carry out. If there are experts in these areas I would absolutely welcome any and all help.

There’s also potentially a possibility in the case of the notebook to use the notebook comms websocket to send data. So great! We could make in improvement for notebooks at least, right? Except notebooks comms require a running kernel, so going a comms-only route means that static notebook plots would not function at all. Is that a reasonable thing? Probably not. Maybe some kind of hybrid approach could be made, but then that adds complexity, and these parts of the library around the notebook are already hard to maintain. Unfortunately the space of use-cases is large, and there is a minefield of trade-offs.

Thanks,

Bryan

On Nov 1, 2016, at 2:45 AM, Marcus Donnelly [email protected] wrote:

Bryan, I seem to remember there was a plan for binary (rather than JSON) serialisation/deserialisation which I’m assuming would make a significant difference - is that still planned for a future upgrade?

On Monday, October 31, 2016 at 3:26:28 PM UTC, Bryan Van de ven wrote:

I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as “lists of lists” in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is very slow. So I’m not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.

The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.

Thanks,

Bryan

On Oct 31, 2016, at 4:06 AM, Rutger Kassies [email protected] wrote:

Hey,

Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.

I’m not sure where Bokeh’s colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer’s (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.

Regards,
Rutger

On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
Michael,

Datashader’s actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:

from matplotlib.cm import viridis
tf.shade(DataArray(x),cmap=viridis,how=‘linear’)

from bokeh.palettes import Spectral11
tf.shade(DataArray(x),cmap=Spectral11,how=‘log’)

See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at https://anaconda.org/jbednar/notebooks. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).

Jim

On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen [email protected] wrote:
Hi Jim and Bryan,
Thank you for the comments. I am beginning to understand the difficulties involved in this.
I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL’s 1 second…however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
One question is then, what default colormap does datashader use and what range? And can i change that colormap?

I got triggered by your comment Jim:

“As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time.”

Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?

If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?

Thanks a lot and best regards
Michael


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/df9b3e70-2860-4048-a788-f04eccb52cad%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/87b27c6c-4007-4f27-b00b-2af3d088a51f%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Arrays where switched to a base64 encoding in 0.12.4, which resulted in a significant performance. You can find some benchmarks in the release announcement:

  https://bokeh.github.io/blog/2017/1/6/release-0-12-4/

There is stil an open issue to adopt a true binary protocol for the case of the server, which may possibly result in even greater improvements:
  
  Improve data transfer, using a binary transfer protocol · Issue #5984 · bokeh/bokeh · GitHub

But no one has had the opportunity to do this work yet.

Thanks,

Bryan

···

On Aug 21, 2017, at 02:52, Michael Hansen <[email protected]> wrote:

Hello, looks like some work was put into this issue. Has there been some improvement in newer versions of bokeh/datashader for this?

On Tue, Nov 1, 2016 at 3:34 PM, Marcus Donnelly <[email protected]> wrote:
Thanks for the info. If I understand correctly, it would be *relatively* (!) straightforward to do it for server apps due to the use of websockets and harder for non-server apps. This might simply mean that those interested in interacting with large volumes of data might be better off writing a server app once the binary option is there. The point being that the server opens up many interaction opportunities (e.g. streaming data, sliders to move through large multi-dimensional datasets etc), so if that's what you need you might be better off using it. For plotting (e.g. heatmap) large volumes of data where interaction isn't important there are other (e.g. static image) possibilities as previously discussed.

On Tuesday, November 1, 2016 at 2:05:12 PM UTC, Bryan Van de ven wrote:
Hi,

Yes that's still the plan but that is also more involved and more risky. It's also not clear to me that it will have any benefit outside Bokeh server apps. In the case of apps, the low level protocol was already built to support multi-part messages. So NumPy arrays could be sent without any translation, or even any copying, directly across the wire into browser typed array buffers (we'll just stipulate little-endian order). This is surely the best that can be done, in principle. But the option is only available because the data is sent over a websocket. Still, it's definitely on my list of things we need to get to.

In the case of non-server apps, the data has to be encoded as text somehow. There is simply no getting around this in any way, since it's part of the HTML page or in a sidecar .js (text) file that gets loaded. Early experiments seemed to indicated that average behavior of binary encodings (e.g. base64 or others) could actually *inflate* the data size to transmit. But that still might be a win, depending on the cost of the encoding, etc. But it's going to require lots of very detailed analysis that is also going to be difficult to carry out. If there are experts in these areas I would absolutely welcome any and all help.

There's also potentially a possibility in the case of the notebook to use the notebook comms websocket to send data. So great! We could make in improvement for notebooks at least, right? Except notebooks comms require a running kernel, so going a comms-only route means that static notebook plots would not function at all. Is that a reasonable thing? Probably not. Maybe some kind of hybrid approach could be made, but then that adds complexity, and these parts of the library around the notebook are already hard to maintain. Unfortunately the space of use-cases is large, and there is a minefield of trade-offs.

Thanks,

Bryan

> On Nov 1, 2016, at 2:45 AM, Marcus Donnelly <[email protected]> wrote:
>
> Bryan, I seem to remember there was a plan for binary (rather than JSON) serialisation/deserialisation which I'm assuming would make a significant difference - is that still planned for a future upgrade?
>
> On Monday, October 31, 2016 at 3:26:28 PM UTC, Bryan Van de ven wrote:
> I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as "lists of lists" in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is *very* slow. So I'm not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.
>
> The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.
>
> Thanks,
>
> Bryan
>
>
> > On Oct 31, 2016, at 4:06 AM, Rutger Kassies <[email protected]> wrote:
> >
> > Hey,
> >
> > Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.
> >
> > I'm not sure where Bokeh's colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer's (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.
> >
> > Regards,
> > Rutger
> >
> >
> > On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
> > Michael,
> >
> > Datashader's actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:
> >
> > from matplotlib.cm import viridis
> > tf.shade(DataArray(x),cmap=viridis,how='linear')
> >
> > from bokeh.palettes import Spectral11
> > tf.shade(DataArray(x),cmap=Spectral11,how='log')
> >
> > See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at Login :: Anaconda.org. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).
> >
> > Jim
> >
> > On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen <[email protected]> wrote:
> > Hi Jim and Bryan,
> > Thank you for the comments. I am beginning to understand the difficulties involved in this.
> > I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL's 1 second....however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
> > The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
> > One question is then, what default colormap does datashader use and what range? And can i change that colormap?
> >
> > I got triggered by your comment Jim:
> >
> > "As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time."
> >
> > Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?
> >
> > If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?
> >
> > Thanks a lot and best regards
> > Michael
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to bokeh+un...@continuum.io.
> > To post to this group, send email to bo...@continuum.io.
> > To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io\.
> > For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bokeh+un...@continuum.io.
> To post to this group, send email to bo...@continuum.io.
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/df9b3e70-2860-4048-a788-f04eccb52cad%40continuum.io\.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/87b27c6c-4007-4f27-b00b-2af3d088a51f%40continuum.io\.

For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAPty9BM0Vt1RfXS8v-oKk-fffu14CbRCRn5ZnfydibW_uW353w%40mail.gmail.com\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.