Multipolygon and polygon plot with GeoJSONDataSource

I’ve been having lots of success geoplotting points with Bokeh, but have now run into an issue trying to plot polygons and multi polygons where even the most basic plot times out (having left it for hours now). It’s my first time using GeoJSONDataSource and I think it’s time I asked for help. I carefully plotted a GeoPandas DataFrame of the below format. I’d upload the geojson because all the data was publicly available but it’s outputting a 162.8 MB file which I’m thinking might be the problem? Or do I have to treat the multi-polygons and polygons differently? The polygons are Canadian electoral districts.

type(data_CAN)
geopandas.geodataframe.GeoDataFrame

data_CAN.shape
(293, 3)

	POP	    Bool	geometry
0	2731571	True	MULTIPOLYGON (((7239903.869 950427.517, 723992...
1	2463431	True	MULTIPOLYGON (((4042996.157 2027876.066, 40452...
2	1942044	True	MULTIPOLYGON (((7632208.177 1268333.720, 76321...
3	1498778	True	POLYGON ((4660000.566 2031508.140, 4661063.386...
4	1381739	True	MULTIPOLYGON (((7194207.560 947086.957, 719435...

Following a couple good examples online, I’m wondering why the code block hangs and won’t display in the notebook. I’m using Bokeh 2.0.2 with Python 3.8.2 in an VSCode workspace with an up to date Jupyter client and core installed.

This completes:

# Create Bokeh Data Structure
sourceG = GeoJSONDataSource(geojson = data_CAN.to_json())

But then hangs here:

output_notebook()

plot = figure(
    title="Area Weighted Mean : Canada",
    plot_height = 600,
    plot_width = 800
    )

plot.patches(
    "xs",
    "ys",
    source = sourceG, 
    line_color = "black",
    line_width = 0.1
    )

show(plot)

It very well may be that it’s not doing anything and it just threw an exception - either on the Python or on the JS side.

Please provide a minimal runnable example with the data that reproduces the issue.

Unfortunately even posting a couple geojson entires is over the character limit for the forums and I don’t have anywhere I can post and link a sample file.

However, I did learn something from the exercise.

Taking the first 20 out of 293 does plot. However, it does really look like Canada.

Does that mean the problem is with the size of the GeoPandas Dataframe itself? I was under the impression bokeh was quite robust for large file sizes. But these are really complex polygons. Should I look at downsampling the polygons? I let it render in the cell for an hour and it never gave an error.

Or is it more likely I have invalid entries that are triggering it to time out?

@BBirdsell gists are good place to quickly share things: https://gist.github.com/

I will say up front that this many polygons/points may simply be too taxing. Bokeh prioritizes interactivity and being able to drill down into data, but the tradeoff there is that all the data has to be sent to the browser. This is much m ore expensive (size wise) than say sending a rendered image. Tho it is possible there is some usage issue or other change that might improve things. But it’s hard to do more than speculate without a full code sample to run and experiment with, @p-himik has noted.

Note that Datashader now supports rendering polygons, and HoloViews or hvPlot can be used to construct Datashader-rendered polygon plots in Bokeh, allowing you to work with much larger polygonal datasets than would be practical in Bokeh itself. See e.g. https://anaconda.org/jonmmease/datashader_spatialpandas/notebook .

2 Likes

Hmmm… it even has dask integration… I think I’ll investigate this route because I have an even bigger viz of the US counties to try next. Thanks for the info.

@James_A_Bednar1 can that notebook be moved or copied out of the personal account and in to some kind of project account?

Gist certainly seems like an interesting product. Finally had time to upload a sample dataset. I don’t dare do more than 50 entries of the over 200. But its there in the raw if thats helpful.

I assume that spatialpandas works similarly to geopandas? I wasn’t able to find a description of their differences but got errors when trying to install them in the same conda env.

https://gist.github.com/BBirdselllab/4f38c78ad87bd5f93c1933badc5a13d9

What are those coordinates? They obviously aren’t lat/lon but I guess they don’t seem like the right order of magnitude for web-mercator either. Also, probably the most relevant question: what is the total number of points?

It looks like the Canadian census divisions to me, which can be downloaded at the link below. The projection is EPSG:3347.
https://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/bound-limit-2016-eng.cfm

Other than that I agree with what has been said earlier. The level of detail in the data is massive overkill, it can never be shown on screen at once. Some options would be:

  • simplify
  • rasterize
  • load in chunks (based on extent/ zoom-level)

There are sections in the data which have a vertex every 20 to 30 meters or so, even (more or less) straight lines are sampled like that. Great as raw data, but that’s not something you want to visualize, regardless of the library you use. Due to all the straight administrative lines, you should be able to simplify it a lot without sacrificing much detail.

Oh good grief… I don’t think I realized the resolution was that small. I’ll go and try to simplify the paths a bit because I’m pretty comfortable with geopandas and don’t need to analyze the paths directly. Are there any good resources online you’d recommend to follow?