Adding lines to scatterplot increases time disproportionately

kosta.kim · January 12, 2017, 11:48pm

I’ve been using Bokeh (0.12.3) to plot information from a networkx graph using a scatterplot backed by WMTS tiles and manually adding lines to represent connections. My graphs contain only a couple thousand nodes so far, well below Bokeh’s threshold, but I recently noticed that adding lines to the plot increased the run time dramatically, even for modest numbers of connections.

I wrote a fairly minimal example (attached) to demonstrate. On my system the no-lines plot finishes in about 7 seconds, while the lines plot takes 7.5 minutes.

Is there anything I can change to avoid this issue? Is there currently a better way to represent a graph with nodes and edges?

Thanks,

Kim

edges.py (2.57 KB)

Bryan · January 13, 2017, 12:31am

Well, a couple of thousand points in one column data source is definitely fine for Bokeh, as a general rule. But you look to be doing more like a couple of thousand of data sources, with a few points each. Slicing things that way creates a much different kind of overhead, and is probably the primary source of your problem. I have a few suggestions:

First, instead of using line over and over, put all your edge data into one CDS, and use the segment glyph method to draw them in a vectorized fashion

http://bokeh.pydata.org/en/latest/docs/reference/plotting.html#bokeh.plotting.figure.Figure.segment

Basically, it takes a CDS with four columns: x0, y0, x1, y1 so that each "row" in the CDS represents the start and end coordinates for one segment. I will speculate that this will offer a significant improvement.

Next, upgrade to 0.12.4 and make sure that all the columns that can be, are Numpy floating point or int32 arrays. This will automatically enable the use of the new binary array protocol, which can show dramatic improvements in serialization time under certain circumstances.

https://bokeh.github.io/blog/2017/1/6/release-0-12-4/

Thanks,

Bryan

···

On Jan 12, 2017, at 5:48 PM, [email protected] wrote:

I've been using Bokeh (0.12.3) to plot information from a networkx graph using a scatterplot backed by WMTS tiles and manually adding lines to represent connections. My graphs contain only a couple thousand nodes so far, well below Bokeh's threshold, but I recently noticed that adding lines to the plot increased the run time dramatically, even for modest numbers of connections.

I wrote a fairly minimal example (attached) to demonstrate. On my system the no-lines plot finishes in about 7 seconds, while the lines plot takes 7.5 minutes.

Is there anything I can change to avoid this issue? Is there currently a better way to represent a graph with nodes and edges?

Thanks,
Kim

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/b38ad59c-dc44-4bff-8256-5ffcfe17b0f2%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
<edges.py>

kosta.kim · January 21, 2017, 2:27pm

Ah, I didn’t realize line was creating all those CDSs. I appreciate the explanation.

I switched to using segment and that solved my problem.

Thanks!