I’ve been using Bokeh (0.12.3) to plot information from a networkx graph using a scatterplot backed by WMTS tiles and manually adding lines to represent connections. My graphs contain only a couple thousand nodes so far, well below Bokeh’s threshold, but I recently noticed that adding lines to the plot increased the run time dramatically, even for modest numbers of connections.
I wrote a fairly minimal example (attached) to demonstrate. On my system the no-lines plot finishes in about 7 seconds, while the lines plot takes 7.5 minutes.
Is there anything I can change to avoid this issue? Is there currently a better way to represent a graph with nodes and edges?
Well, a couple of thousand points in one column data source is definitely fine for Bokeh, as a general rule. But you look to be doing more like a couple of thousand of data sources, with a few points each. Slicing things that way creates a much different kind of overhead, and is probably the primary source of your problem. I have a few suggestions:
First, instead of using line over and over, put all your edge data into one CDS, and use the segment glyph method to draw them in a vectorized fashion
Basically, it takes a CDS with four columns: x0, y0, x1, y1 so that each "row" in the CDS represents the start and end coordinates for one segment. I will speculate that this will offer a significant improvement.
Next, upgrade to 0.12.4 and make sure that all the columns that can be, are Numpy floating point or int32 arrays. This will automatically enable the use of the new binary array protocol, which can show dramatic improvements in serialization time under certain circumstances.
I've been using Bokeh (0.12.3) to plot information from a networkx graph using a scatterplot backed by WMTS tiles and manually adding lines to represent connections. My graphs contain only a couple thousand nodes so far, well below Bokeh's threshold, but I recently noticed that adding lines to the plot increased the run time dramatically, even for modest numbers of connections.
I wrote a fairly minimal example (attached) to demonstrate. On my system the no-lines plot finishes in about 7 seconds, while the lines plot takes 7.5 minutes.
Is there anything I can change to avoid this issue? Is there currently a better way to represent a graph with nodes and edges?