Adding legends to scatter plot from a list of strings

I am trying to plot multiple groups using scatter, when the groups are defined in a separate list. Like this:

from bokeh.plotting import figure, show
from bokeh.embed import components
from bokeh.palettes import viridis
import pandas as pd

data = pd.DataFrame({‘x’:[1,2,3,4,3], ‘y’:[1,2,4,4,1]})
groups = [‘a’,‘a’,‘b’,‘a’,‘b’]
colors = [‘red’,‘red’,‘black’,‘red’,‘black’]

p = figure()
p.scatter(data.x, data.y, color=colors) #this works
show§
p2 = figure()
p2.scatter(data.x, data.y, color=colors, legend=groups) #this does not

Suggestions? I noticed that I could use circle instead of scatter and go through all groups separately, but that quite inefficient and ugly solution (in practice I have arbitrary number of groups and thousands of points). Also, I would like to use same approach when I have projection from say scikit, so instead of passing data.x and data.y I would have something like *data.T.

BokehJS can do this grouping, but only if all the data is in a Bokeh ColumnDataSource. It's easy to accomplish by passing the dataframe to the glyph function as "source":

  from bokeh.plotting import figure, show
  from bokeh.embed import components
  from bokeh.palettes import viridis
  import pandas as pd

  groups = ['a','a','b','a','b']
  colors = ['red','red','black','red','black']

  # need all data here in one place
  data = pd.DataFrame({'x':[1,2,3,4,3], 'y':[1,2,4,4,1], 'groups': groups, 'colors': colors})

  p = figure()

  # use column names and pass a source
  p.scatter('x', 'y', color='colors', legend='groups', source=data)

  show(p)

This kind of usage is described here:

  https://bokeh.pydata.org/en/latest/docs/user_guide/annotations.html#legends

Thanks,

Bryan

···

On Mar 6, 2018, at 07:11, [email protected] wrote:

from bokeh.plotting import figure, show
from bokeh.embed import components
from bokeh.palettes import viridis
import pandas as pd

data = pd.DataFrame({'x':[1,2,3,4,3], 'y':[1,2,4,4,1]})
groups = ['a','a','b','a','b']
colors = ['red','red','black','red','black']

p = figure()
p.scatter(data.x, data.y, color=colors) #this works
show(p)
p2 = figure()
p2.scatter(data.x, data.y, color=colors, legend=groups) #this does not

Thanks,

That works well. One thing I wonder is how much overhead there is by continuously switching back and forth with Pandas DataFrame and ColumnDataSource? I mean that every time I draw something, I need to take a subset of a the DataFrame and transform it to ColumnDataSource?

Best,
Jouni

Hi,

There's certainly some overhead, but it's not really avoidable. Ultimately all the data has to be serialized, transferred between processes, and unserialized. There are many, many copies and transformations that happen along the way.

That said, you might also look at filtering CDS with views instead of making new ones:

  https://bokeh.pydata.org/en/latest/docs/user_guide/data.html#filtering-data-with-cdsview

However, please be advised that this is both a very new feature, as well as one that touched some of the most parts of Bokeh at once. So there are definitely still some kinks to work out.

Thanks,

Bryan

···

On Mar 15, 2018, at 00:46, [email protected] wrote:

Thanks,

That works well. One thing I wonder is how much overhead there is by continuously switching back and forth with Pandas DataFrame and ColumnDataSource? I mean that every time I draw something, I need to take a subset of a the DataFrame and transform it to ColumnDataSource?

Best,
Jouni

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/9cb90125-41dd-46da-9606-e63f8dd8d32b%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.