Representing data with two categories by both color and marker-shape

dollarklavs · October 21, 2019, 7:12pm

I would like to know if it is possible to have a scatterplot where each datapoint is represented as a different marker shape based on a category and also have its color represent another category?
I would also like to have a legend where the marker shapes are grey, and each unique category entry is next to it, and another legend with each color and each unique entry from this category next to it.
I thought that I could maybe draw two scatterplots - one with squares in different colours based on one of the categories, and one with the different marker shapes based on the other category. I would then use the legends from these two and then delete the glyphRenderers. Finally I would draw a scatterplot which would include both the marker shapes and the colours into one marker. This legend doesn’t look great so I would then delete this. Leaving me with the two legends without plots, and the plot without the legend.
Unfortunately I haven’t been able to get this to work, as I get an error:
WARNING:bokeh.core.validation.check:W-1000 (MISSING_RENDERERS): Plot has no renderers: Figure(id='1015', ...).
I also tried using a color bar, but this only works on continuous values.

Can anyone guide me to how to accomplish this?

Regards
Jonas

Bryan · October 21, 2019, 7:43pm

If you would like someone to help fix your code, please always share an example of what it is exactly that you have tried. Otherwise, all I can really do is point you at this section of the documentation which covers both color and marker mapping:

https://docs.bokeh.org/en/latest/docs/user_guide/data.html#transforming-data

It’s not really clear what you are asking for/about, regarding the legend.

dollarklavs · October 22, 2019, 7:56am

I’ll try to ask a better question. Here it goes:
I would like to create a scatter plot where each marker has a shape based on one category and a color based on a second category. I would also like to have two legends. One representing the shapes and one representing the colors.
Something like below:

I’m not really sure as to how to do this, but I have tried to create a scatter plot for each category using either a variable color or marker shape. And then afterwards assigning the renderers to a legend and then delete the renderers.

Finally I add a scatter plot where both the color and marker are defined by variables. I would then like to not have the legend for this plot to show up.

Here is the code:

import pandas as pd                                                                             
from bokeh.plotting import figure, output_file, show, save                                      
from bokeh.models import ColumnDataSource, CDSView, GroupFilter, Slope, ColorBar, Legend        
from bokeh.transform import factor_cmap, factor_mark, linear_cmap                               
                                                                                                
output_file(r"./analysis.html")                                                                 
                                                                                                
df = pd.read_csv(r"./sheet.csv", sep=";", decimal=",", thousands=".")                           
                                                                                                
column_list = df.columns.tolist()                                                               
indstillings_type = df["indstillings_type"].unique().tolist()                                   
                                                                                                
# categories                                                                                    
indstillings_type = [x for x in indstillings_type if str(x) != "nan"]                           
AC = ["A", "B", "C", "D", "E", "F"]                                                             
                                                                                                
# Constants                                                                                     
COLORS = ["#7fc97f", "#f0027f", "#386cb0", "#fdc086", "#beaed4", "#ffff99"]                     
MARKERS = ["hex", "circle_x", "triangle", "diamond", "asterisk", "square"]                      
                                                                                                
                                                                                                
# dataframe -> bokeh CDS                                                                        
source = ColumnDataSource(df)                                                                   
                                                                                                
# initialiser figure                                                                            
p = figure(plot_width=1400, plot_height=800, active_scroll="wheel_zoom")                        
                                                                                                
# create two scatterplots. One where the color is decided by one catoegory, and                 
# one where the marker shape is decided by another category                                     
marker = factor_mark("indstillings_type", MARKERS, indstillings_type)                           
color = factor_cmap("Overenskomst", COLORS, AC)                                                 
scatter_color = p.scatter(                                                                      
    "Alder",                                                                                    
    "brutto_korrigeret",                                                                        
    source=source,                                                                              
    legend=color,                                                                               
    fill_alpha=0.4,                                                                             
    size=12,                                                                                    
    marker="square",                                                                            
    color=color,                                                                                
)                                                                                               
scatter_marker = p.scatter(                                                                     
    "Alder",                                                                                    
    "brutto_korrigeret",                                                                        
    source=source,                                                                              
    legend=marker,                                                                              
    fill_alpha=0.4,                                                                             
    size=12,                                                                                    
    marker=marker,                                                                              
    color="grey",                                                                               
)                                                                                               
                                                                                                
# delete the the two above renderers                                                            
p._property_values.pop("renderers")                                                             
                                                                                                
# Add just the legends                                                                          
color_legend = Legend(items=[("Overenskomst", [scatter_color])])                                
marker_legend = Legend(items=[("indstillings_type", [scatter_marker])])                         
                                                                                                
# add the final scatterplot. The goal is to keep the markers, but not the                       
# legend, relying on the two legends from above.                                                
scatter_marker = p.scatter(                                                                     
    "Alder",                                                                                    
    "brutto_korrigeret",                                                                        
    source=source,                                                                              
    fill_alpha=0.4,                                                                             
    size=12,                                                                                    
    marker=marker,                                                                              
    color=color,                                                                                
)
                                         
p.add_layout(color_legend, "left")       
p.add_layout(marker_legend, "right")     
p.legend.location = "bottom_right"       
                                         
save(p)

Bryan · October 22, 2019, 4:26pm

What you want is possible, but is also unusual, and the API does not cater to it. It will require fighting the library slightly to achieve what you want. I can work up a toy demonstration but it will not be until tomorrow or so.

Bryan · October 23, 2019, 12:54am

Here is a toy example that illustrates the techniques you can use (you will need to adapt them to your specifics)

from bokeh.models import Legend, LegendItem
from bokeh.palettes import Category10_3
from bokeh.plotting import figure, show
from bokeh.sampledata.iris import flowers
from bokeh.transform import factor_cmap, factor_mark

SPECIES = ['setosa', 'versicolor', 'virginica']
MARKERS = ['x', 'circle', 'triangle']

p = figure()

# plot the actual data using factor and color mappers (using the same
# column `species` here but you can use two different columns if you want)
r = p.scatter("petal_length", "sepal_width", source=flowers, fill_alpha=0.4, size=12,
              marker=factor_mark('species', MARKERS, SPECIES),
              color=factor_cmap('species', Category10_3, SPECIES))

# we are going to add "dummy" renderers for the legends, restrict auto-ranging
# to only the "real" renderer above
p.x_range.renderers = [r]
p.y_range.renderers = [r]

# create an invisible renderer to drive color legend
rc = p.rect(x=0, y=0, height=1, width=1, color=Category10_3)
rc.visible = False

# add a color legend with explicit index, set labels to fit your need
legend = Legend(items=[
    LegendItem(label=SPECIES[i], renderers=[rc], index=i) for i, c in enumerate(Category10_3)
], location="top_center")
p.add_layout(legend)

# create an invisible renderer to drive shape legend
rs = p.scatter(x=0, y=0, color="grey", marker=MARKERS)
rs.visible = False

# add a shape legend with explicit index, set labels to fit your needs
legend = Legend(items=[
    LegendItem(label=MARKERS[i], renderers=[rs], index=i) for i, s in enumerate(MARKERS)
], location="top_right")
p.add_layout(legend)

show(p)

dollarklavs · October 29, 2019, 9:58am

Thanks a lot Bryan,
Your code was just what I was looking for.

PS I’m aware that it is unusual and maybe a little cluttered to look at. But I’ve not found anything better to represent two categorical values on a scatter plot.

Again thanks for helping.
Regards,
Jonas

nodice · May 14, 2020, 11:52pm

Thank you for the solution. I was trying to do the same thing!