Providing data for a multi line from a dataframe? Having trouble figuring out how to restructure the dataframe / use multi_line's syntax

tvkyq · March 29, 2020, 2:14am

I’ve got a big dataframe called dataset which kinda looks like this:

Year	New_ID	Peak_change
1980	1	1
1980	2	0.985478
1980	3	0.974417
1990	1	1
1990	2	0.996124
1990	3	0.98718
1990	4	0.990161
1990	5	0.980106
2000	1	1
2000	2	0.985065
2000	3	0.984873
2000	4	0.978173

And I’m trying to create a line graph which looks like the output from this stackoverflow question. I’ve tried this:

from bokeh.plotting import figure, output_file, show
output_file(‘index.html’)
p = figure(
title =‘Simple’)
p.line(x= dataset.New_ID, y= dataset.Peak_change, legend_label= ‘Year’)
show(p)

But it isn’t being split up into different lines based off legend_label in the way circle plots seem to automatically work out. It seems I need a multi line graph, but I don’t understand how to provide the data using the syntax demonstrated here. Is it possible to set this up in some kind of for loop?

p-himik · March 29, 2020, 7:39am

MultiLine is a single glyph. If you want each line to have a separate legend item, you will have to use the regular Line glyph multiple times:

from io import StringIO

import pandas as pd
from bokeh.plotting import figure, show

df = pd.read_csv(StringIO("""Year	New_ID	Peak_change
1980	1	1
1980	2	0.985478
1980	3	0.974417
1990	1	1
1990	2	0.996124
1990	3	0.98718
1990	4	0.990161
1990	5	0.980106
2000	1	1
2000	2	0.985065
2000	3	0.984873
2000	4	0.978173"""), sep='\t')

p = figure(title='Simple')
for name, group in df.groupby('Year'):
    p.line(x=group.New_ID, y=group.Peak_change, legend_label=str(name))

show(p)

tvkyq · March 29, 2020, 9:40am

Thank you!! This works beautifully!

If I could bother you on a slightly different topic, I’m now stuck on trying to assign different colors to the lines but I don’t understand how to integrate that into this for loop… I’ve tried to adapt the code from this example:

for name, group, i in dataset.groupby('Year'), Spectral11:
    p.line(x=group.New_ID, y=group.Peak_change, legend_label= str(name), color= Spectral11[i])

But for reasons probably obvious to you that didn’t work. Any ideas?

p-himik · March 29, 2020, 9:56am

Given the example that you link (I didn’t check the code):

for (name, group), color in zip(dataset.groupby('Year'), Spectral11):
    p.line(x=group.New_ID, y=group.Peak_change, legend_label=str(name), color=color)

Note that this will display up to 11 lines because Spectral11 has only 11 colors. If you need more lines, some other palette or coloring scheme should be used.

tvkyq · March 29, 2020, 10:30am

OK great I think that’s nearly there but it didn’t quite work. I’ve tried this, with Category20:

for (name, group), color in zip(dataset.groupby(‘Year’), Category20):
p.line(x=group.New_ID, y=group.Peak_change, legend_label= str(name), color=color)

The error message it throws is ValueError: expected an element of either Enum('aliceblue', 'antiquewhite', 'aqua', 'aquamarine', ... going on for maybe 100 colors. I don’t know what Enum means but could it be expecting some kinda string?

p-himik · March 29, 2020, 12:52pm

That’s because Category20 is not a palette itself, it’s a collection of palettes. Check out its definition in the Bokeh sources.
You probably want to use Category20_20.

tvkyq · March 29, 2020, 2:02pm

Ah! Thank you.

Thanks for the help!

Carl · March 29, 2020, 3:02pm

Hopefully this isn’t straying to far from the path, I too was struggling with this where I wanted to plot multiple columns in a pandas dataframe I had. The dataframe consisted of a Date column (my x-axis value) and Water / Oil columns. The below code is slight variation of what has already been posted above, just wanted to share simple example of what I have working. The main point was to ensure the Date column was in correct datetime format and adding x_axis_type=“datetime” to the figure.


p = figure(sizing_mode="scale_width", plot_height=85,x_axis_type="datetime")

for column_name, spectral_color in zip(['Water','Oil'], Spectral10):
        p.scatter(x=df['Date'], y=df[column_name], size=1.3, color=spectral_color, alpha=0.8, 
                 legend_label=column_name)

tvkyq · March 30, 2020, 4:34am

Hey @p-himik might be starting to digress here but tooltips don’t seem to be working, probably because of how we’ve brought in the lines with the for loop and df.groupby. It renders but all the values are ‘???’. I hoped this would work:

ttips = HoverTool(tooltips = [
    ("Peak Change", "@{Peak_change}")])
p = figure(title ='whatever')
p.add_tools(ttips)

But no dice. I’ve tried to enter tooltips within the for loop but that doesn’t seem to work as p.line won’t accept tooltips as an argument.

I’ve had a look around on other posts and stackoverflow but I only found posts that are obsolete or involved workarounds for slightly different things I didn’t understand.

Any ideas?

@Carl cool to see others trying to figure out the same thing!

Carl · March 30, 2020, 8:33am

Hi @tvkyq, I think an area to explore is adding the dataframe to bokehs columndatasource

p-himik · March 30, 2020, 9:12am

Carl is correct. By using @{Peak_change} (BTW {} are not necessary here, I think), you’re asking the tooltip to look up this value in the data source of the renderer that you hover over. Your renderers don’t have any data source attached since you just provide the data directly.
Start providing the data via data sources, and make sure they have Peak_change column - the tooltip will work then.

tvkyq · March 30, 2020, 9:37am

Ah, not shown in the code I’ve supplied is a line I’ve since added, source = ColumnDataSource(dataset). Perhaps I could’ve better explained how I’m stuck. My issue is that I don’t know to assign source to the line renderers.

If I drop source=source into the for loop it throws the error message, ‘Expected x and y to reference fields in the supplied data source’, so I’m now unsure what to do.

p-himik · March 30, 2020, 12:27pm

An analog of what you do:

x = [1, 2, 3]
y = [2, 3, 4]
source = ColumnDataSource(data=dict(x=x, y=y))
plot.line(x=x, y=y, source=source)

Now compare it what should be done instead:

source=ColumnDataSource(data=dict(x=[1, 2, 3], y=[2, 3, 4]))
plot.line(x='x', y='y', source=source)

Notice how I pass strings that reference columns within the source instead of passing the data directly.

If you think the error message could be improved, let us know.

tvkyq · March 30, 2020, 1:21pm

Hey @p-himik I’m sorry but I’m kinda new to programming and don’t understand how

source=ColumnDataSource(data=dict(x=[1, 2, 3], y=[2, 3, 4]))
plot.line(x='x', y='y', source=source)

could be applied to the dataframe I’m using that’s thousands of rows long with the .groupby('Year') for loop you’ve helped me set up.

Isn’t

x = [1, 2, 3]
y = [2, 3, 4]
source = ColumnDataSource(data=dict(x=x, y=y))

the same as

source=ColumnDataSource(data=dict(x=[1, 2, 3], y=[2, 3, 4]))

this? You’ve just haven’t specified x and y separately, right?

Thanks for being patient with me…

p-himik · March 30, 2020, 1:58pm

Since none of the glyphs share any data, you have to create a separate data source for each glyph. Just put source = ... in the loop body and use it:

for (name, group), color in zip(dataset.groupby('Year'), Spectral11):
    source = ColumnDataSource(group)
    p.line(x='New_ID', y='Peak_change', source=source, legend_label=str(name), color=color)

tvkyq · March 30, 2020, 9:09pm

I’m sorry man but I still can’t get this to work. I’m still getting the same error message as before. I feel like an idiot. Here’s all the relevant code:

ttips = HoverTool(tooltips=[
    ("Peak Change", "@Peak_change")])
p = figure(title ='whatever')

for (name, group), color in zip(dataset.groupby('Year'), Category20_20):
    source = ColumnDataSource(group)
    p.line(x=group.New_ID, y=group.Peak_change, legend_label=str(name),
           source=source)

p.add_tools(ttips)

Are you sure source = ColumnDataSource(group) is correct? Earlier I had it ahead of the for loop as source = ColunDataSource(dataset), as dataset is the name of the dataframe. I also tried source = ColumnDataSource(dataset.groupby(group)) in the for loop but that didn’t work either.

p-himik · March 30, 2020, 9:39pm

This is my line:

p.line(x='New_ID', y='Peak_change', source=source, legend_label=str(name), color=color)

And this is yours:

p.line(x=group.New_ID, y=group.Peak_change, legend_label=str(name),
       source=source)

Apart from the order of the arguments (which doesn’t matter), do you see any difference?

tvkyq · March 30, 2020, 10:13pm

…x=group.New_ID, y=group.Peak_change …

tvkyq · May 6, 2020, 1:35am

Hey @p-himik I’m sorry to bring this up again as you’re probably sick of me and I’ve already made enough of a fool of myself. I’m trying to adapt the logic used to create these lines with ColumnDataSource and I can’t figure out where I’m going wrong.

I have a heatmap you might recognize and the values are controlled by a slider, and I’d like this same controlled data to be shown as lines. I think the issue might be caused by source = ColumnDataSource(column) - can CDS be supplied with whatever you’re trying to create a group / subset of in this way?

from pandas import *
from bokeh.io import show
from bokeh.layouts import column
from bokeh.models import LinearColorMapper, CustomJS, Slider, ColumnDataSource
from bokeh.palettes import Viridis256
from bokeh.plotting import figure

df = DataFrame({'attribute': ['Y', 'Y', 'Y', 'Y', 'Z', 'Z', 'Z', 'Z']
                   , 'period': [1, 2, 3, 4, 1, 2, 3, 4]
                   , '1': [1, 37, 44, 13, 41, 51, 18, 14]
                   , '2': [10, 3, 44, 53, 20, 9, 18, 14]
                   , '3': [80, 37, 22, 13, 13, 44, 18, 14]})

df['period'] = df['period'].astype(str)
periods = df.period.unique().tolist()
attributes = df.attribute.unique().tolist()
selectable_columns = ['1', '2', '3']

source=ColumnDataSource(df)

active = 1
values_select = Slider(title="Values", start=1, end=3, step=1, value=active)
color_mapper = LinearColorMapper(palette=Viridis256, low=df[str(active)].min(), high=df[str(active)].max())
heatmap_fig = figure(x_range=periods, y_range=attributes)
renderer = heatmap_fig.rect(x="period", y="attribute", width=1, height=1, line_color=None, source=source, name=str(active),
                            fill_color={'field': str(active), 'transform': color_mapper})

line_fig = figure()

for column in selectable_columns:
    source = ColumnDataSource(column)
    line_fig.line(x='period'
            , y='attribute'
            , legend_label=str(column)
            , source=source)

values_select.js_on_change('value', CustomJS(args=dict(renderer=renderer, heatmap_fig=heatmap_fig, line_fig=line_fig), code="""\
    const active = cb_obj.value.toString();
    const data = renderer.data_source.data[active];
    renderer.name = active;
    const {transform} = renderer.glyph.fill_color;
    renderer.glyph.fill_color = {field: cb_obj.value, transform: transform};
    heatmap_fig.reset.emit()
    line_fig.reset.emit()
"""))

show(column(values_select, heatmap_fig, line_fig))

The error says it expects a dict or pandas.df … but how else could I supply the data?

_jm · May 6, 2020, 5:25am

Hi @tvkyq

In the for-loop, you are iterating over selectable_columns which is a list of integers. The error occurs on the first iteration at first line of that loop because you are attempting to make a ColumnDataSource out of the current iterator value, '1'.

for column in selectable_columns:
    source = ColumnDataSource(column)

Another quick observation is to not reuse the name column for your loop variable as it shadows the bokeh column model that you’ve imported with the same name.

from bokeh.layouts import column

I hope this helps and someone with more context of what you want to actually include in the line plot can answer the actual question you are about.