Bokeh in the webplot landscape: Vega-light, Bokeh JSON et al

Karel · August 7, 2020, 12:21pm

I’ve been looking at web-plot solutions in Python for a new (potentially big) project. Of course I love Bokeh, but I’m also very interested in a sane way to serialize plots. This lead me to find Vega, and it’s grammar spec ‘Vega-light’. I have not tried it, but the main Python library to use Vega (https://altair-viz.github.io/) does not appeal to me, as I don’t think it gives the low-level control I need or want.

Are there specs of the current Bokeh JSON? Could we think about some smart way to use the same serialized interface for multiple plot backends?

p-himik · August 7, 2020, 12:32pm

Are there specs of the current Bokeh JSON?

Not at the moment. I believe that maybe Bokeh v3+ will tackle that.

Could we think about some smart way to use the same serialized interface for multiple plot backends?

Even if such a common ground solution is possible, I don’t think there’s much to gain by implementing it. Different plotting libraries have vastly different features and idiosyncrasies.

Karel · August 7, 2020, 12:57pm

True, I’m not 100% sure about the validity of such an approach. It would be nice to define such a standard for my own projects, but indeed, maybe generality between Bokeh, Vega, matplotlib is a bridge too far. (although the Vega-light spec tries to solve that)

For information, our client wants to swap out backends on the fly (matplotlib and something webby and maybe something command-liny like gnuplot). I do want to give a good answer about feasibility, but the answer might indeed just be 'not worth the effort ’

Bryan · August 7, 2020, 4:07pm

There was once some interest around MEP 25 to provide a JSON serialization scheme for Matplotlib that other tools (Bokeh, Plotly, etc) could potentially use to integrate with. But there has been no activity on the front in nearly a half a decade and my personal view is that that effort is unlikely to go anywhere. It’s a lot of work, and the original API never had this usage in mind, so there’s exhausting “but what do we do here?” discussions around every corner.

As for Bokeh specifically, the foundational difference between Bokeh’s JSON specification and something like Vega is that Bokeh admits real object identity (i.e. “pointers”) to objects so that e.g. two plots can share one range on both the Python and JS sides, or so that a data source can be synchronized across Python/JS runtimes. Vega does not have that, it’s just composed of bare JSON blocks. That’s a pretty wide conceptual gulf, and I don’t know how you’d bridge it in general.

I think things become more realistic if you start to talk about only going one direction. Converting Vega (or even MPL possibly) to Bokeh is something I could imagine at least partly working to some level, just purely through JSON-level transformations. Going the other direction seems much, much harder.

But even in the “easier” case, the story is the same with MEP 25. Any of this would require sustained effort on tedious work that, worse, also requires people-coordination and buy-in. Sorry to seem so pessimistic but speaking honestly, I doubt the incentives will ever be there for this to happen, unless it just becomes some new contributor’s personal labor of love.

Bryan · August 7, 2020, 4:24pm

Ad for the Bokeh specification, there’s sort of two levels to think about it:

Organizing level

This is the fact that at the top level, all Bokeh objects look like this:

{
  "attributes": {
    ....
  },
  "id": "6956",
  "type": "Range1d"
}

This has not changed in a very long time, and is unlikely to ever change. I suppose it could be documented more prominently. A similar statement holds for the general “shape” of Document. Where there is more variability is the individual “object” level.

Object Level

This is the contents of that attributes block above. This is a key-value dict that holds all the Bokeh properties of the Bokeh model. E.g. a Range1d has start and end properties, so:

{
  "start": 0,
  "end": 10,
  ...
}

So the question is where is the ground truth for what that set of properties is? Currently, the Python API is the ground truth, so the Python reference guide is the document for exactly what goes in attributes.

We also have continuous tests to ensure that the BokehJS classes agree 100% with this Python “ground truth”. So Python and JS are guaranteed to be in sync with each other. What we don’t have, is an independent, declarative specification outside of both Python or JS. We do have a tool that can dump an ad-hoc JSON blob that describes the total collection of Models and their Properties, but that “spec” is derived from the Python source.

We could imagine a situation where some JSON spec is itself primarily the “ground truth”, and then do code generation everywhere based on that spec. E.g. the way REST models APIs are auto-generated from Swagger files. In that world, Python descends from its special place of prominence and is just another language that implements a binding for the “Bokeh spec”. There are definitely benefits to this approach, but also costs, like anything.

Karel · August 7, 2020, 5:09pm

Thanks for your insight Bryan, clear as always. I tend to lean in the same direction.

unless it just becomes some new contributor’s personal labor of love.

Or a willing paying client in my case I do think maybe what was asked was maybe not what was needed, so a better discussion with said client is needed.

A “good enough” solution for me would indeed be to have some common base that I choose myself (e.g mpl, something custom) and convert from that to my target backend. In that way, I can implement what (I think) the client wants without being bogged down too much with generality, as well as build a starting ground for anyone wanting to take up the momentous tasks.

In that case mpl (maybe taking some of the vega-light grammar) → Bokeh would be a good starting point, as it’s Python first, and web second. Then we can forget about BokehJS for now.

Bryan · August 7, 2020, 8:38pm

Just to add some more historical context, Python-level adaption was the first approach everyone tried. Jake Vanderplas created a small library called mplexporter that had hooks that other tools could use to adapt MPL rendering calls in to their own rendering calls. Jake created this for his own mpld3 library at the time. Unfortunately it depended heavily on MPL private internals, and as MPL moved on, eventually just stopped working. The brief “MPL compat” support that Bokeh had many years ago was based on this tool. I will say the results were never great (which is why we did not gnash our teeth about dropping MPL compat support).