Line plot using view

Hello,

I want to use view with line plot for reduce memory consumption.

import numpy as np

from bokeh.io import show
from bokeh.models import ColumnDataSource, CDSView, GroupFilter
from bokeh.plotting import figure

x = np.linspace(0, 10, 10)
y = 5 * x + 10
kind = ["a"] * 5 + ["b"] * 5

data = ColumnDataSource(dict(x=x, y=y, kind=kind))
view_a = CDSView(filter=GroupFilter(name="kind", group="a"))
view_b = CDSView(filter=GroupFilter(name="kind", group="b"))

The code without view

p = figure()
p.line(x="x", y="y", source=data)
show(p)

works well. But if I use view

p.line(x="x", y="y", source=data, view=view_a)
show(p)

it raises

ERROR:bokeh.core.validation.check:E-1024 (CDSVIEW_FILTERS_WITH_CONNECTED): CDSView filters are not compatible with glyphs with connected topology such as Line or Patch: GlyphRenderer(id='p1738', ...)
...
UnsetValueError: GroupFilter(id='p1687', ...).column_name doesn't have a value set

After https://github.com/bokeh/bokeh/issues/9388#issuecomment-1151378497 it looks like MultiLine should work. But I can’t understand how to use figure.multi_line with view – the code like

p.multi_line(xs="x", ys="y", source=data, view=view_a)
show(p)

raises UnsetValueError: GroupFilter(id='p1864', ...).column_name doesn't have a value set even if I set x, y and kind as “list of list”.

Moreover, it is not clear what to do if the data source will be pandas.DataFrame (convert to dict-like with list of arrays/list/series?).

First things first: a view will not do this, at all. A view lets you specify a subset of the full data to use, but the full data is still always there. Otherwise it would not be possible to update a view dynamically, or have multiple different views of the same data at the same time.

1 Like

I can add a couple things.

  1. CDSView is not compatible with the Line glyph (and Patch glyph/any single connected topology-type glyph) → good discussion why is here Explicitly warn that CDSView is unsupported on line glyphs · Issue #9388 · bokeh/bokeh · GitHub . Basically it doesn’t make sense to apply a filter/sub view on a single connected topology because the desired behaviour is ambiguous, e.g. do you want the line to be “broken” along filtered indices, or do you want it to simply “skip” them and connect to the next unfiltered index? There are probably other desired behaviours beyond just these too.

  2. Looking at your MRE, your ColumnDataSource isn’t Line glyph material anyway because (I think) you want two separate lines (one for ‘a’ and one for ‘b’)?

So I think you’re right with going down the multiline route. And you’ve correctly surmised you need to set up your source with xy coords as list of lists so that each line corresponds to one record in the CDS.

Your issue from there is actually that you’re assigning the name attribute on the GroupFilter , where you should be assigning column_name. See the docs, they are wildly different things → filters — Bokeh 3.3.4 Documentation

Working example:

import numpy as np

from bokeh.io import show
from bokeh.models import ColumnDataSource, CDSView, GroupFilter
from bokeh.plotting import figure
import pandas as pd

x = np.linspace(0, 10, 10)
y = 5 * x + 10
kind = ["a"] * 5 + ["b"] * 5

df = pd.DataFrame(data={'x':x,'y':y,'kind':kind})

gb = df.groupby('kind').agg({c:lambda x: list(x) for c in ['x','y']}).reset_index()
                             
data = ColumnDataSource(gb)
view_a = CDSView(filter=GroupFilter(column_name="kind", group="a"))
view_b = CDSView(filter=GroupFilter(column_name="kind", group="b"))

p = figure()
p.multi_line(xs="x", ys="y", source=data, view=view_a)
show(p)

2 Likes

@Bryan you are right, my bad. I meant view as similar to pandas df.loc[...] mechanism (is it so?). Therefore, “reduce memory consumption” was mentioned only in this sense (e.g., sharing source data between different bokeh objects)

@gmerritt123 thank, it works well.

But is there way to use the source df as data for ColumnDataSource using view tricks so that without “doubling” with gb?

For example, if I will add figure.circle() using the same df, but which expect x and y as list (not list of list).

Is there an analogue of multi_line() for figure.scatter() for “point”-plots (way to use gb only)?

P.S. column_name → name, ohhh (coding at night clouds the mind :sweat_smile:). Thanks a lot for help with such things!

I wish → see discussion here about that idea [FEATURE] MultiScatter · Issue #12367 · bokeh/bokeh · GitHub

1 Like

@Bryan you have noted in [FEATURE] MultiScatter ¡ Issue #12367 ¡ bokeh/bokeh ¡ GitHub

I suppose another consideration is that mulit-line lacks a way to have “line plus marker” and this would afford that. cc @bokeh/dev for thoughts.

I can’t find a way to add points to line or multi_line (something like plot(x, y, "o-") in Matplotlib). Is it not supported yet?

If I understand correctly, the only way is something like

p.line(x, y, line_color=...)
p.circle(x, y, fill_color=...)

and

p.multi_line(xs, ys, line_color=...)
for x, y in zip(xs, ys):
     p.circle(x, y, fill_color=...)

@dinya More or less. It should also be possible to have one call to circle even in the multi_line case, rather than a loop. But to do that, you would need to flatten all the xs and ys arrays larger “combined” 1d arrays to pass to circle. Whether one way or the other is desirable would depend on the specific situation

1 Like

I’ve left a comment on the issue, there has been some recent work on “glyph decoration” which might end up a more sensible route than an new separate MultiScatter glyph, but we’ll need to see where things currently stand.

2 Likes

@Bryan Thanks for the information.

Does I understand correctly that the feature being developed will allow not only “lines+points” (“multi_line+scatter”), but also only “points” (only “scatter”)?

@dinya I don’t really understand the question. “Only points” (i.e “scatter”) already exists today. Also just to be clear, nothing is actually under development yet, things are still very much at a discussion stage.

@Bryan I meant, are there plans to implement a mechanism similar to the “line – multi_line” pair, i.e. “scatter – multi_scatter (multi_point)”? Or is it still planned to implement only the optional addition of points to multi_line (like plot(x, y, "o-") in terms of matplotlib).

For example, the multi_scatter would be useful for boxplot. The bar is created with patches, whiskers – with multi_line, and outliers – with multi_scatter. All glyphes use the same source (“list of list”) for xs and ys.

There’s nothing concrete planned yet, everything is still just at discussion phase. Posting to the issue with your use case is a good way to register requirements to be considered.

1 Like