bokeh performance [was: Fwd: [bokeh] Histogram of NumPy array]

[Accidentally responded off-list]

···

Begin forwarded message:

From: Bryan Van de Ven <[email protected]>
Subject: Re: [bokeh] Histogram of NumPy array
Date: January 15, 2014 at 9:01:33 PM CST
To: [email protected]

Hi Kevin,

The first thing to note is that Bokeh is really two libraries: "bokehjs", a javascript library for interactive plotting that can be used standalone to develop interactive visualizations in the browser, and "bokeh" the python library that can communicate with bokehjs to drive browser-based visualizations from python. I'm not entirely sure how to respond to your question (since I'm not intimately familiar with the internals of MPL) but I think there are a few general points worth mentioning:

* the python part of Bokeh does not generate and emit lots of javascript. The bokehjs library takes as input plot "specifications" in the form of lightweight json objects. Generating these json objects that will be consumed by bokehjs is the primary task of the python part of Bokeh. It's also trivial and extremely quick to do, so the overhead on the python side is negligible. Once the specification makes it to the browser (by being embedded in a static HTML file, or by being served from a bokeh plot server) the bokehjs library takes over.

* bokehjs is built on HTML5 canvas, which has become a very performant platform for rendering 2D graphics. Browser JS engines have likewise seen dramatic improvements in the last several years, especially when combined with things like typed arrays, which we make use of whenever we can in bokehjs.

* The Bokeh devs have a good deal of experience working on interactive plotting libraries. Peter is the principal author of Chaco. I also worked on Chaco. I also developed a similar C++ library focused on high interactivity several years ago. We care a great deal about interactive performance and it is considered from the very beginning. We avoid computations like screen mappings until we must do them, and we use spatial indices to skip computations altogether for data that would otherwise render offscreen, and to improve selection hit testing.

Even so, there are still optimizations to add and explore. We will be adding dynamic downsampling and other Level-of-Detail optimizations to support interactivity with even larger data sets. We are going to investigate "compiled" paths for drawing all the glyphs on the HTML5 canvas. We also plan on instrumenting our code with something like rstats in order to monitor performance and prevent performance regressions.

Hopefully this sheds some light on the architecture of Bokeh. Other Bokeh devs might chime in with their own thoughts, as well.

Thanks,

Bryan

On Jan 15, 2014, at 5:41 PM, [email protected] wrote:

Hey Bryan, how is Bokeh so much faster than matplotlib? I was under the impression that bokeh was pure python in the plumbing, am I wrong?

Thanks!

On Sunday, December 29, 2013 3:56:56 PM UTC-8, Bryan Van de ven wrote:

Kevin,

Glad to help. One thing I should mention is that an MPL compatibility layer is on the roadmap, which should allow people to use the python ggplot package as well as seaborn, but target bokeh plots as the output very easily. We will definitely be adding more schematized and high-level functions to the bokeh API as well, but we wanted to make sure to build up from a very flexible and composable foundation.

Bryan

On Dec 29, 2013, at 5:28 PM, Kevin <[email protected]> wrote:

Holy molly that is verbose! Thanks for your help Bryan. I think ggplot has spoiled me, but I'm sure there is a method to your madness, probably involves not abstracting away so much and doing away with too much 'magic' so you can have more granularity around visuals.

On Sun, Dec 29, 2013 at 3:24 PM, Bryan Van de Ven <[email protected]> wrote:
Hi Kevin,

Yes, right now this is a little more clunky than it should be, but it is definitely possible.

       import numpy as np
       from bokeh.plotting import *
       from bokeh.objects import Range1d

       mu, sigma = 100, 15
       x = mu + sigma * np.random.randn(10000)
       hist, bins = np.histogram(x, bins=50)
       width = 0.7 * (bins[1] - bins[0])
       center = (bins[:-1] + bins[1:]) / 2

       output_file("/tmp/hist.html")

       rect(center, hist/2.0, width, hist, y_range=Range1d(start=0,end=700))

       show()

In the near future, looks for a bars() function to make this a little easier, and also for auto range scaling that works with glyphs like rect (it currently works well with "pointlike" marker glyphs but needs some extra logic to work well with glyphs like rect that have "extent")

Bryan

On Dec 29, 2013, at 12:56 AM, Kevin <[email protected]> wrote:

Thanks Bryan, sorry for missing the obvious. Do you have any examples of histograms?

On Dec 28, 2013, at 21:00, Bryan Van de Ven <[email protected]> wrote:

Hi,

We are definitely interested in improving pandas integration, but also interested in having minimal hard dependencies. You can definitely plot directly from numpy arrays, check out:

http://bokeh.pydata.org/plot_gallery/stocks.html
http://bokeh.pydata.org/plot_gallery/lorenz_example.html
http://bokeh.pydata.org/plot_gallery/color_scatter_example.html
http://bokeh.pydata.org/plot_gallery/correlation.html

or some examples that don't involve pandas.

Bryan

On Dec 28, 2013, at 7:58 PM, [email protected] wrote:

Is Bokeh like yhat's ggplot in that all data must be coverted to a pandas df before plotting? I have some large numpy arrays that I would love to plot histograms of using Bokeh.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/45acd5d5-68d1-494d-b23f-983e9deff25e%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/17C631A1-F102-4460-8BA3-1F22E04901F2%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/0DC84B0F-3869-4FCD-8D31-921C7FD7A4EC%40gmail.com.
For more options, visit https://groups.google.com/a/continuum.io/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/AE071CA1-D59A-43C2-B47D-B92AC7979E59%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAPxXntwXhdumFYGasyq14ymuWWQs0b5n2uPPXz6m295edtow1g%40mail.gmail.com.
For more options, visit https://groups.google.com/a/continuum.io/groups/opt_out.