performance improvements for bokehjs

Paddy_Mullen · November 29, 2013, 9:03pm

I just had a very productive conversation with Bryan.

We touched on
* datasource performance

we started with the ObjectArrayDataSource, which makes it convenient to plot small numbers of glyphs. Our prime use case though is lots of points. Internally bokehjs represents ColumnDatasources to the renderers initially via the same interface that ObjectArrayDatasource does `data points`. The circle renderer than takes this array of dictionaries representation and converts it into a dictionary of arrays, which is what ColumnDataSource originally had.

Bryan isn't sure that we even need an ObjectArrayDataSource, but if we keep it, it should be implemented in terms of the ColumnDataSource.

By making these changes we can use less memory and have better performance. I will try to get to this over the weekend. Any further input?

* when to add options vs new implemenations
I have added features to the select tool so that it only actually calls select on mouse up, without plotting intermediate states. I wasn't sure if this should be implemented as an entirely new tool , or as an option. Bryan recommended an option on the select tool.

* I have built a ColumnSelectTool

It works as a modal with a list of columns in the datasource. It internally maintains its own list of plotted columns, and when this list changes, it clears off all of the renderers on the plot, and builds new ones for the specific columns.

This tool brings up a couple of interesting architecture decisions. How should I maintain the GlyphSpecs for the extra columns? I can add this into the UI, but we might want to think about it some. At a minimum each subsequent plot should have a different color.

* updating properties infrastructure

It would be fairly easy to build out a UI for manipulating attributes of plot objects if there was a data structure that told me "A renderer can be of type Circle,Line, Patch…" "A patch renderer has attributes of Color, line_width …" . I think we are closer on the Python side for this.

This should be possible. Although we don't have metaclasses in javascript, we do have constructors that we can use to interrogate the just constructed object and add information to a global registry. We also get getters and setters. While we are at it, it might make sense to have a make_plot_object function that accepts at least one of ViewClass, ModelClass and CollectionClass, but otherwise just fills them in with sensible defaults. Adding a new type of a plot object is quite cumbersome. I have to modify at least 3 files (new coffee file, main.coffee, base.coffee, and collections.coffee. We should be able to get this down to 2 files .

Bryan mentioned that properties.coffee was his first bit of coffee script and it's probably ripe for improvement. He wants it to be a backbone model. I'm not clear on all that this entails.

Bryan and I both want an auto-color property. I want it for the ColumnSelectionTool, as an attribute of a ColumnSelectionToolModel. Bryan thinks it could work as its own independent property to be applied to glyph renderers. Either way, it would function to make sure each renderer gets a different color (up to the number of colors in the palette) than the last renderer.

* where inteligence should lie
The fast select model that does less, and especially abstract rendering bring up the question of where should intelligence lie? Should we depend on users to know that they are going to plot 1Million points, that it will be slow, and therefore they should enable different options? should the python side construct objects using sensible defaults and testing so that this isn't run into? Should JS handle it.

* js based downsampling
I think there are advantages to doing JS based downsampling. This could probably get us performant plotting for 50k-1Million points. The more we can do in JS the more flexible bokeh will be.

I'm really enjoying using bokeh for real work.

Paddy