Why is this legend going into party mode rotating across colors when using a slider with a callback?

I’ve set up a legend and it all works fine until I start using a slider to filter results and it suddenly thinks it’s in a nightclub. I think it might be due to it referencing fields from another dictionary with lists of a different length but the factor_cmap works as expected for coloring the circles. Here’s a simplified version of what I’m working with:

from bokeh.plotting import figure, show
from bokeh.models import Slider, CustomJSFilter, CDSView, ColumnDataSource, CustomJS, LegendItem
from bokeh.layouts import column, layout
from bokeh.transform import factor_cmap

data = dict(Apples=[97, 34, 23, 6, 26, 97, 21, 92, 73, 10, 92, 14, 77, 4, 25, 48, 26, 39, 93],
            Bananas=[87, 63, 56, 38, 57, 63, 73, 56, 30, 23, 66, 47, 76, 15, 80, 78, 69, 87, 28],
            Oranges=[21, 65, 86, 39, 32, 62, 46, 51, 17, 79, 64, 43, 54, 50, 47, 63, 54, 84, 79],
            Category = ['A', 'B', 'B', 'C', 'A', 'C', 'B', 'C', 'C', 'B', 'A', 'A', 'B', 'B', 'A', 'C', 'C', 'C', 'C'])
colordict = dict(Colors = ['red', 'green', 'blue'], Categories = ['A', 'B', 'C'] )
source = ColumnDataSource(data=data)
MinApples = Slider(start=0, value=50, end=100, step=1)
MinApples.js_on_change('value', CustomJS(args=dict(source=source), code="""source.change.emit()"""))
custom_filter = CustomJSFilter(args=dict(source=source, MinApples=MinApples), code='''
var indices = [];
for (var i = 0; i < source.get_length(); i++){
    if (source.data['Apples'][i] > MinApples.value){
        indices.push(true);
    } else {indices.push(false); } }
return indices;''')
view = CDSView(source=source, filters=[custom_filter])
p = figure()
p.circle('Oranges', 'Bananas', source=source, view=view, size=20, legend_field = 'Category', fill_color = factor_cmap('Category', palette = colordict['Colors'], factors = colordict['Categories']))
controls = [MinApples]
l = layout([[controls, p]], sizing_mode="stretch_both")
show(l)

Could someone please help me understand what I might be doing wrong?

Also, I’m trying to figure out how to disable legend fields when they’re not being used (eg. set slider to 96 and C isn’t shown but still present in legend), like this post asked back in '16. The response at the time was that it can be programmed explicitly through some code I didn’t understand, and that major improvements were coming. Given that answer was from 0.12.2 and we’re now on 1.40 is there now a simpler way to do this?

Thanks for the help!

Hi tvkyq,

It’s possible-- not yet definite-- that there may be a bug here in the way that categorical legends behave when CDSViews are involved. I need to work up a minimal example to prove it, and decide what elements are important to the problem; my testing so far suggests that the factor_cmap isn’t an issue, and if we instead add a column of explicitly defined colors to your data source, the same thing occurs. Anyway, I’ll be working on that and submitting a GitHub issue if we do determine that it’s not behaving like it ought, and I’ll update this thread with a link to that GH issue.

In the meantime, we have a couple options for workarounds. We could revisit the original example you cited in your other post, the example in which a new figure is being drawn on each update. That’s possible, but I imagine your real-life end goal is more complicated than the apples, bananas, and oranges you have here, and I’d be wary about scaling.

Perhaps a better option would be to avoid using a CDSView, instead using your original data to construct one large, source-of-truth CDS. Then, with each update, your callback would build a CDS that’s a subset of that untouched original. The subset CDS could then be assigned to be the data source for your circle glyph.

I hope that makes sense, with the caveat that I have not built a working example of it. I believe it will get around this issue of the grouped categorical legend and the colors assigned to legend items not matching up. Please give it a shot and let us know what questions you have along the way. Thanks for your patience while this was investigated!

Ah! That’s a pity it’s not working as intended. Thank you very much for getting back to me anyway!

I’m sorry to say I don’t understand what you mean about avoiding using a CDSView. My layman’s understanding is ColumnDataSource is Bokeh handling the full set of data (which we normally define as source = , shown above), and then the CDSView is this same data after it’s been through whatever the view has done to what we just defined as source. So if we simply use the CDS (without the view applied), how would the filters build a CDS that’s a subset of the original? It kinda sounds like we’d be creating a CDSView without calling it that. Or have I completely misunderstood how this works?

Here’s an example. You’re right in that we’re basically replicating the functionality of a CDSView without using that class, since it’s the class that seems to be causing the legend issues.

from bokeh.io import show
from bokeh.models import Slider, CustomJS, ColumnDataSource
from bokeh.plotting import figure
from bokeh.layouts import column

master_cds = ColumnDataSource(data=dict(
    x=[1, 2, 3, 4, 5, 6, 7, 8, 9],
    y=[3, 2, 1, 3, 2, 1, 3, 2, 1],
    color=['red', 'red', 'red', 'green', 'green', 'green', 'blue', 'blue', 'blue'],
    group=['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']
))

subset_cds = ColumnDataSource(data=master_cds.data.copy())

min_x_slider = Slider(start=0, value=0, end=10, step=1)
new_callback = CustomJS(args=dict(master_cds=master_cds, subset_cds = subset_cds, min_x_slider=min_x_slider), code="""
    // clear out subset CDS to be rebuilt
    subset_cds.clear();
    
    // create a subset CDS by pulling out only the x-values >= our set minimum.
    // iterate over the contents of the 'x' array and check for value. 
    for (i = 0; i < master_cds.data.x.length; i++) { 
      if (master_cds.data.x[i] > min_x_slider.value) {
        subset_cds.data.x.push(master_cds.data.x[i]);
        subset_cds.data.y.push(master_cds.data.y[i]);
        subset_cds.data.color.push(master_cds.data.color[i]);
        subset_cds.data.group.push(master_cds.data.group[i]);
      }
    };      
    subset_cds.change.emit();
    """
    )
min_x_slider.js_on_change('value', new_callback)

p = figure(x_range=(0, 10), y_range=(-1, 5), plot_height=300, tools='save')

p.circle(x='x', y='y', radius=0.5, color='color', legend_field='group', source=subset_cds)

col = column([p, min_x_slider])
show(col)

Hey carolyn your example works beautifully but I don’t know how to adapt it to the more complex model I’m working on.

The main issue for the moment is that I’m working with around a dozen fields. Do you have any ideas on how to have it push for all of them instead of typing out subset_cds.data.[field n].push(master_cds.data.x[i]); for every field (which are often changing)? Maybe set up another kind of for loop within Javascript?

And sorry for not replying for so long! I’ve been travelling and have been flat out juggling all sorts of things.

It’s hard to say without more specifics, but you’re probably on the right track with setting up an iterator in the JS.

For example, you could maintain a list of fields that will need updating, and set up a for loop to iterate over that list, and for each one, subset_cds.data[fieldname].push(master_cds.data[fieldname].

Hmm I think the main issue is that this Javascript is well beyond me. I chose Bokeh because I’m new to programming and only know a little Python and didn’t understand Javascript would be required to built interactions which didn’t involve any kind of server (which figuring out is also well beyond my current ambitions).

Here’s a simplified version of the filter I’ve been drafting now, which I’ve adapted from other examples:

var indices = [];
  for (var i = 0; i < source.get_length(); i++)
  {if ( source.data['Periodno'][i] == Periodsl.value)
     {if (source.data['Sectors Scheduled'][i] >= SchedSlider.value[0] &&
     source.data['Sectors Scheduled'][i] <= SchedSlider.value[1] &&
     checkbox_group.active.map(i=>checkbox_group.labels[i]).includes(source.data['Main_Airline'][i]) &&
     arriving_select.value.includes(source.data['Arriving_short'][i]) &&
     departing_select.value.includes(source.data['Departing_short'][i]))
        {   indices.push(true);
        } else {indices.push(false);}
       } else {indices.push(false);}
    }return indices;

Note that there’s a nested IF statement there. The model was getting quite laggy as I’m dealing with a big dataset and I thought nesting it would speed things up so I moved heaven and earth figuring out how to make a for loop and there was no observable difference in performance lol :stuck_out_tongue:

Initially I don’t understand why I’ve got var indices [] which seems redundant but the whole thing doesn’t work when I remove it and the var in the 2nd line to try to match your code.

I also don’t understand why my example has ...indices.push(true) / ...(false), while yours has push(master_cds.data.x[i]) … I thought I could adapt the code and then go about creating an iterator / for loop for all of the fields but I can’t even get to that stage! I’m really lost here lol

Maybe I should just start properly learning Javascript!

And again, so sorry for all the stupid questions and thank you so much for all your help! I sincerely appreciate it. If Bokeh is non for profit please let me know how I can support you / Bokeh and I’ll make a donation.

Thanks for the kind words @tvkyq. Bokeh is a fiscally sponsored project of NumFocus, which is a 501(c​)(3) nonprofit charity in the United States. Donations are certainly appreciated (and tax-deductible in the US to the extent provided by law). To make a donation to Bokeh through NumFocus, you can visit:

1 Like

Cool, I just donated!

I’ve gone to edit down my previous post as I write it poorly while in a hurry but it seems I no longer have the option to. Oh well. The example code I gave was far too long so here’s a simpler example:

custom_filter = CustomJSFilter(args=dict(source=source, AppleSlider = AppleSlider), code=
'''
var indices = [];
for (var i = 0; i < source.get_length(); i++){
    if (source.data['Apples'][i] >= AppleSlider.value[0])
   ){   indices.push(true);
    } else {indices.push(false);}}return indices;
''')

I think I sorta understand why instead of push(true) carolyn’s example has subset_cds.data.x.push(master_cds.data.x[i]) … we’re no longer creating a list (or indices?) of what rows should be filtered or not, and now we’re creating a subset data set, right? So in the example I’ve included in this reply could I replace the indices.push(true) with some kind of for loop that builds the new subset data set, correct?

Given I know next to nothing about Javascript could someone please help me figure out how to write a for loop in Javascript like carolyn mentions, or maybe point me in the direction of an example?

Again, sorry for all the dumb questions!

Hi tvkyq,

Here’s my proposed method in a working example. It involves nested for loops, which can be a hassle to think about, and I’m always interested in hearing things solved a different way-- but this gets the job done. I’ve commented to try to explain the logic. In a nutshell, for each row matching our minimum criteria (i in the outer for loop), for each column we have in our list (j in the inner for loop), push the value to the subset. And you’re correct that what we’re doing here is building a whole subset rather than a list of indices.

from bokeh.io import show
from bokeh.models import Slider, CustomJS, ColumnDataSource
from bokeh.plotting import figure
from bokeh.layouts import column

master_cds = ColumnDataSource(data=dict(
    x=[1, 2, 3, 4, 5, 6, 7, 8, 9],
    y=[3, 2, 1, 3, 2, 1, 3, 2, 1],
    color=['red', 'red', 'red', 'green', 'green', 'green', 'blue', 'blue', 'blue'],
    group=['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']
))

# this is our list of names of columns that we want to update when the slider changes.
fields_to_update = ['x', 'y', 'color', 'group']

subset_cds = ColumnDataSource(data=master_cds.data.copy())

min_x_slider = Slider(start=0, value=0, end=10, step=1)
new_callback = CustomJS(args=dict(master_cds=master_cds,
                                  subset_cds=subset_cds,
                                  min_x_slider=min_x_slider,
                                  fields_to_update=fields_to_update),
                        code="""
    // clear out subset CDS to be rebuilt
    subset_cds.clear();
    
    // create a subset CDS by pulling out only the x-values >= our set minimum.
    // iterate over the contents of the 'x' array and check for value. 
    for (i = 0; i < master_cds.data.x.length; i++) { 
      // if the x-value of this row of the master is at least our minimum, we want to push it to the subset.
      if (master_cds.data.x[i] > min_x_slider.value) {
        // iterate over our list of columns we want to update.
        for (j=0; j < fields_to_update.length; j++) {
          // for each column, push this value to the subset.
          subset_cds.data[fields_to_update[j]].push(master_cds.data[fields_to_update[j]][i]);
        }
      }
    };      
    subset_cds.change.emit();
    """
    )
min_x_slider.js_on_change('value', new_callback)

p = figure(x_range=(0, 10), y_range=(-1, 5), plot_height=300, tools='save')

p.circle(x='x', y='y', radius=0.5, color='color', legend_field='group', source=subset_cds)

col = column([p, min_x_slider])
show(col)

OK great! Your reply makes a ton of sense! We’re getting there!

It’s not quite working for me yet though. It renders so the Javascript’s syntax seems to be acceptable but the full unfiltered data set is showing, and the widgets aren’t doing anything.

I think the problem might be widgets’ js_on_change('value', new_callback. The widgets in the code I’ve been drafting’s on_change part is a little longer, js_on_change('value', CustomJS(args=dict(subset_cds=subset_cds), code="""subset_cds.change.emit()""")). Is this necessary? I tried adapting it to your code, .js_on_change('value', new_callback) but it throws an error message, ‘ValueError: not all callback values are CustomJS instances’ … any idea why it could be telling me this?

Hm. You might have to paste some code for that one-- all I’m doing in mine is pre-defining the callback and assigning to a variable new_callback, and then using that as the js_on_change argument.

The draft of what I’m working on now but it’s now a big convoluted file that’ll take me some time to simplify down. The main difference that I’ve just noticed is that it’s using a CustomJSFilter, not CustomJS. I’ve tried changing it to custom JS but that makes it completely shit the bed and I don’t know how to start begin converting things from there :stuck_out_tongue:

I tried rebuilding my overall model on the template of your latest example which worked so nicely, changing things bit by bit until I figured out what the important difference was. I began with changing the data source to a dict built from the excel file I’m using, and that rendered but it disappears when changing the slider. I then tried a simpler excel file with columns A, B, C with random values and it all worked perfectly, so I don’t know what I’m doing wrong…

The data seems to be stored in the same way and are of the same kind of values. It seriously looks like the same thing, although the more likely reason is there’s some obvious thing I’m overlooking.

Man, this has been a humbling experience to say the least! I must look like such a moron on this forum lol

I’m not quite ready to post the whole script and accompanying excel file publicly so maybe it’s best I send them to you directly via PM.

On a side note, is there any chance this CDSView issue might be resolved in a new patch? I think I’ve bitten off more than I can chew!

Alright! I think I’ve figured out why I can’t replicate your example with a larger excel file!

It seems that the size of the dict is too large, or at least there are too many rows. I’m guessing the subset thing we’ve created is the issue here, as this has all worked fine with the far larger data set I’m working on. Here’s the code I’ve been trying it on which is a close variation of your previous example:

from bokeh.io import show
from bokeh.models import Slider, CustomJS, ColumnDataSource
from bokeh.plotting import figure
from bokeh.layouts import column
import pandas as pd, numpy as np

df = pd.read_excel('fruitstest.xlsx', 'Working')
data = dict()
for i in df.columns: data.update({''+i:df[''+i]})

fields = list(data.keys())
master_cds = ColumnDataSource(data=data)
subset_cds = ColumnDataSource(data=master_cds.data.copy())
min_x_slider = Slider(start=0, value=300, end=5000, step=100)
new_callback = CustomJS(args=dict(master_cds=master_cds,
                                  subset_cds=subset_cds,
                                  min_x_slider=min_x_slider,
                                  fields=fields),
                        code="""
    subset_cds.clear();
    for (i = 0; i < master_cds.data.Apples.length; i++) { 
      if (master_cds.data.Apples[i] > min_x_slider.value) {
        for (j=0; j < fields.length; j++) {
          subset_cds.data[fields[j]].push(master_cds.data[fields[j]][i]);
        }
      }
    };      
    subset_cds.change.emit();
    """  )
min_x_slider.js_on_change('value', new_callback)
p = figure(plot_height=300)
p.circle(x='Bananas', y='Oranges', radius=1, legend_field='Farm', source=subset_cds)
col = column([p, min_x_slider])
show(col)

And here’s the excel file. I tried to upload it to this post but I could only upload images. Hope this link is ok.

Note there are two tabs in the excel file which you’ll have to change in the df = pd.read_excel('fruitstest.xlsx', 'Working') between ‘Working’ and ‘Broken’, and the file path is relative so it needs to be in the same folder as the script (like you need to be told that lol).

Do you see what I mean? Hope we can figure this out soon! This is killing me!

And thank you again so so much for all your help!

I did some further testing, and it looks like the only reason the legend in my example is well-behaved is that as we move the slider left to right, the elements that remain in the ‘group’ column are in the order we expect them to be (ABC, then just BC as all the As are eliminated, then just C).

If you change the CDS in my example to:

master_cds = ColumnDataSource(data=dict(
    x=[1, 2, 3, 4, 5, 6, 7, 8, 9],
    y=[3, 2, 1, 3, 2, 1, 3, 2, 1],
    color=['red', 'green', 'blue', 'red', 'green', 'blue', 'red', 'green', 'blue'],
    group=['A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C']
))

… then you see, as you move the slider, that it also goes into party mode.

So the question is: can you impose a sort order on legend entries? and my initial searching suggests: maybe not at this time, with this specific use case. Others have been down this road before with GroupBy legends, but the examples I’ve found 1) acknowledge that this is an area for improvement, and 2) don’t take your use case of restructuring the dataset into account.

Arrghhhh!!! This bloody party mode lol. We were so close!

My data’s order doesn’t matter, so yeah, I could impose a sort order on legend entries. But I don’t think this solves my other problem as your workaround doesn’t seem to work after 1000 rows or so for reasons I don’t understand, and my data is well beyond that. Did you see my earlier reply with the .xlsx and my code with the ‘Broken’ / ‘Working’ sheets?

Looks like I’ll have to finally give up on this and remove the legend from what I’m making… Reckon there’s any chance this could be resolved in a future update? Would love to see it going, especially in the way of your example where whatever isn’t shown in the figure isn’t shown in the legend.

And once again, thank you so much for all the help!

1 Like

I’ve put in a GitHub issue here with what I know-- feel free to subscribe for updates!

1 Like