I am trying to plot multiple groups using scatter, when the groups are defined in a separate list. Like this:
from bokeh.plotting import figure, show
from bokeh.embed import components
from bokeh.palettes import viridis
import pandas as pd
data = pd.DataFrame({‘x’:[1,2,3,4,3], ‘y’:[1,2,4,4,1]})
groups = [‘a’,‘a’,‘b’,‘a’,‘b’]
colors = [‘red’,‘red’,‘black’,‘red’,‘black’]
p = figure()
p.scatter(data.x, data.y, color=colors) #this works
show(p)
p2 = figure()
p2.scatter(data.x, data.y, color=colors, legend=groups) #this does not
Suggestions? I noticed that I could use circle instead of scatter and go through all groups separately, but that quite inefficient and ugly solution (in practice I have arbitrary number of groups and thousands of points). Also, I would like to use same approach when I have projection from say scikit, so instead of passing data.x and data.y I would have something like *data.T.
BokehJS can do this grouping, but only if all the data is in a Bokeh ColumnDataSource. It's easy to accomplish by passing the dataframe to the glyph function as "source":
from bokeh.plotting import figure, show
from bokeh.embed import components
from bokeh.palettes import viridis
import pandas as pd
groups = ['a','a','b','a','b']
colors = ['red','red','black','red','black']
# need all data here in one place
data = pd.DataFrame({'x':[1,2,3,4,3], 'y':[1,2,4,4,1], 'groups': groups, 'colors': colors})
p = figure()
# use column names and pass a source
p.scatter('x', 'y', color='colors', legend='groups', source=data)
from bokeh.plotting import figure, show
from bokeh.embed import components
from bokeh.palettes import viridis
import pandas as pd
data = pd.DataFrame({'x':[1,2,3,4,3], 'y':[1,2,4,4,1]})
groups = ['a','a','b','a','b']
colors = ['red','red','black','red','black']
p = figure()
p.scatter(data.x, data.y, color=colors) #this works
show(p)
p2 = figure()
p2.scatter(data.x, data.y, color=colors, legend=groups) #this does not
That works well. One thing I wonder is how much overhead there is by continuously switching back and forth with Pandas DataFrame and ColumnDataSource? I mean that every time I draw something, I need to take a subset of a the DataFrame and transform it to ColumnDataSource?
There's certainly some overhead, but it's not really avoidable. Ultimately all the data has to be serialized, transferred between processes, and unserialized. There are many, many copies and transformations that happen along the way.
That said, you might also look at filtering CDS with views instead of making new ones:
However, please be advised that this is both a very new feature, as well as one that touched some of the most parts of Bokeh at once. So there are definitely still some kinks to work out.
That works well. One thing I wonder is how much overhead there is by continuously switching back and forth with Pandas DataFrame and ColumnDataSource? I mean that every time I draw something, I need to take a subset of a the DataFrame and transform it to ColumnDataSource?