On Aug 21, 2017, at 02:52, Michael Hansen <[email protected]> wrote:
Hello, looks like some work was put into this issue. Has there been some improvement in newer versions of bokeh/datashader for this?
On Tue, Nov 1, 2016 at 3:34 PM, Marcus Donnelly <[email protected]> wrote:
Thanks for the info. If I understand correctly, it would be *relatively* (!) straightforward to do it for server apps due to the use of websockets and harder for non-server apps. This might simply mean that those interested in interacting with large volumes of data might be better off writing a server app once the binary option is there. The point being that the server opens up many interaction opportunities (e.g. streaming data, sliders to move through large multi-dimensional datasets etc), so if that's what you need you might be better off using it. For plotting (e.g. heatmap) large volumes of data where interaction isn't important there are other (e.g. static image) possibilities as previously discussed.
On Tuesday, November 1, 2016 at 2:05:12 PM UTC, Bryan Van de ven wrote:
Hi,
Yes that's still the plan but that is also more involved and more risky. It's also not clear to me that it will have any benefit outside Bokeh server apps. In the case of apps, the low level protocol was already built to support multi-part messages. So NumPy arrays could be sent without any translation, or even any copying, directly across the wire into browser typed array buffers (we'll just stipulate little-endian order). This is surely the best that can be done, in principle. But the option is only available because the data is sent over a websocket. Still, it's definitely on my list of things we need to get to.
In the case of non-server apps, the data has to be encoded as text somehow. There is simply no getting around this in any way, since it's part of the HTML page or in a sidecar .js (text) file that gets loaded. Early experiments seemed to indicated that average behavior of binary encodings (e.g. base64 or others) could actually *inflate* the data size to transmit. But that still might be a win, depending on the cost of the encoding, etc. But it's going to require lots of very detailed analysis that is also going to be difficult to carry out. If there are experts in these areas I would absolutely welcome any and all help.
There's also potentially a possibility in the case of the notebook to use the notebook comms websocket to send data. So great! We could make in improvement for notebooks at least, right? Except notebooks comms require a running kernel, so going a comms-only route means that static notebook plots would not function at all. Is that a reasonable thing? Probably not. Maybe some kind of hybrid approach could be made, but then that adds complexity, and these parts of the library around the notebook are already hard to maintain. Unfortunately the space of use-cases is large, and there is a minefield of trade-offs.
Thanks,
Bryan
> On Nov 1, 2016, at 2:45 AM, Marcus Donnelly <[email protected]> wrote:
>
> Bryan, I seem to remember there was a plan for binary (rather than JSON) serialisation/deserialisation which I'm assuming would make a significant difference - is that still planned for a future upgrade?
>
> On Monday, October 31, 2016 at 3:26:28 PM UTC, Bryan Van de ven wrote:
> I believe the problem is mostly related to serialization/deserialization. In particular as part of the server work 2d arrays started getting serialized as "lists of lists" in JSON (not intentionally, per se, it just fell out of those big changes that way) and this is *very* slow. So I'm not sure how much something like this would help. But that said, you could probably create a custom extension that behaved in this way. As for color mapping, it can either happen in python or in the browser, whichever your want.
>
> The first best thing to try for the core library is to make sure 2d arrays get flattened before serialization, but this means finding a way of communicating array shape information separateli in some way that does not break other cases.
>
> Thanks,
>
> Bryan
>
>
> > On Oct 31, 2016, at 4:06 AM, Rutger Kassies <[email protected]> wrote:
> >
> > Hey,
> >
> > Something which might also give some speedup is to reduce the dynamic range of the data. If you for example have a float32 layer, converting it to RGBA (bytes) on the backend means you are still sending over 32bits to the browser. For the data which i work with, you often dont need the full dynamic range of the datatype your data is in (especially for viz purposes). If you have for example something like relative humidity (0-100), having it to 1 (or maybe even 0) decimal accuracy would be sufficient for visualization. You could convert it to a byte on the backend (data*255/100), send it over, en reverse the conversion at the frontend, and then apply the colormap.
> >
> > I'm not sure where Bokeh's colormaps are currently applied (before or after sending to the browser), but i think technically it could be done in the browser. If this would be combined with something like Matplotlibs concept of normalizer's (data scaling) this might result in some speed up for cases where reducing the datatype to something lower then 32bit can be done.
> >
> > Regards,
> > Rutger
> >
> >
> > On Sunday, October 30, 2016 at 10:46:10 AM UTC+1, James A. Bednar wrote:
> > Michael,
> >
> > Datashader's actual rendering time should be in microseconds for an array of that size, so I suppose your browser is slower than mine at displaying large images. Yes, you can change how the data is rendered, e.g.:
> >
> > from matplotlib.cm import viridis
> > tf.shade(DataArray(x),cmap=viridis,how='linear')
> >
> > from bokeh.palettes import Spectral11
> > tf.shade(DataArray(x),cmap=Spectral11,how='log')
> >
> > See the documentation at http://datashader.readthedocs.io/en/latest/api.html#datashader.transfer_functions.shade , along with examples in the pipeline and census notebooks at Login :: Anaconda.org. The census example and most others show how to zoom, pan, etc. using InteractiveImage, and the landsat example shows how to resample an existing raster like you have here (rather than rasterizing individual points as in most other examples).
> >
> > Jim
> >
> > On Sun, Oct 30, 2016 at 7:01 AM, Michael Hansen <[email protected]> wrote:
> > Hi Jim and Bryan,
> > Thank you for the comments. I am beginning to understand the difficulties involved in this.
> > I tried your datashader example jim, and it does indeed render fast. It takes around 4-5 seconds compared to bokehs 17 seconds, MPL's 1 second....however as you say datashader does not downsample like MPL does which is a good thing, so that is worth the extra seconds.
> > The 1200x1200 image was just an example i made - in reality the matrices and images we are trying to visualize are somewhat bigger.
> > One question is then, what default colormap does datashader use and what range? And can i change that colormap?
> >
> > I got triggered by your comment Jim:
> >
> > "As Bryan indicated, where datashader will be most useful is if you have images much larger than your screen resolution, or at least much larger than whatever plot size you choose. Then datashader will let you keep the full array at the Python side, which can handle it just fine. Datashader then passes only a much downsampled version to the browser, updated each time you zoom to make it feel like the full resolution was there the whole time."
> >
> > Does that mean that I can somehow have my big image on the server, and use some frontend to interact with it (zoom/pan etc.)?
> >
> > If yes, this could be very interesting. Do you maybe have a small example of how this could be done with a big image - say 10000x8000 ?
> >
> > Thanks a lot and best regards
> > Michael
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to bokeh+un...@continuum.io.
> > To post to this group, send email to bo...@continuum.io.
> > To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/772a0c5e-fe30-41e0-b354-983579ceb72c%40continuum.io\.
> > For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bokeh+un...@continuum.io.
> To post to this group, send email to bo...@continuum.io.
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/df9b3e70-2860-4048-a788-f04eccb52cad%40continuum.io\.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/87b27c6c-4007-4f27-b00b-2af3d088a51f%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAPty9BM0Vt1RfXS8v-oKk-fffu14CbRCRn5ZnfydibW_uW353w%40mail.gmail.com\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.