Hover tool in bokeh boxplot

alpha_corporate · May 20, 2022, 10:49am

import pandas as pd
import numpy as np

from bokeh.io import output_notebook, show,save
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource,HoverTool, Range1d,Legend
from bokeh.palettes import Reds,Greys
from bokeh.layouts import column,row,grid
from bokeh.transform import factor_cmap, dodge
from bokeh.models import BasicTickFormatter

output_notebook()

df = pd.DataFrame({'Group' : np.random.choice(['A','B','C','D'], 400),
                   'Points1' : np.random.randint(100,1000,400),
                   'Points2': np.random.randint(100,1000,400)
                  })
df['Total']=df.Points1+df.Points2
df

source=ColumnDataSource(data=dict(df,p1=df.Points1))

def box_plot(df, vals, label, ylabel=None,xlabel=None,title=None):
 
    # Group Data frame
    df_gb = df.groupby(label)
    # Get the categories
    cats = list(df_gb.groups.keys())

    # Compute quartiles for each group
    q1 = df_gb[vals].quantile(q=0.25)
    q2 = df_gb[vals].quantile(q=0.5)
    q3 = df_gb[vals].quantile(q=0.75)
                       
    # Compute interquartile region and upper and lower bounds for outliers
    iqr = q3 - q1
    upper_cutoff = q3 + 1.5*iqr
    lower_cutoff = q1 - 1.5*iqr

    # Find the outliers for each category
    def outliers(group):
        cat = group.name
        outlier_inds = (group[vals] > upper_cutoff[cat]) \
                                     | (group[vals] < lower_cutoff[cat])
        return group[vals][outlier_inds]

    # Apply outlier finder
    out = df_gb.apply(outliers).dropna()

    # Points of outliers for plotting
    outx = []
    outy = []
    for cat in cats:
        # only add outliers if they exist
        if cat in out and not out[cat].empty:
            for value in out[cat]:
                outx.append(cat)
                outy.append(value) 
                
    # If outliers, shrink whiskers to smallest and largest non-outlier
    qmin = df_gb[vals].min()
    qmax = df_gb[vals].max()
    upper = [min([x,y]) for (x,y) in zip(qmax, upper_cutoff)]
    lower = [max([x,y]) for (x,y) in zip(qmin, lower_cutoff)]

    cats = [str(i) for i in cats]
# Build figure
    p = figure(sizing_mode='stretch_width', x_range=cats,height=300,toolbar_location=None)
    p.xgrid.grid_line_color = None
    p.ygrid.grid_line_width = 2
    p.yaxis.axis_label = ylabel
    p.xaxis.axis_label = xlabel
    p.title=title
    p.y_range.start=0
    p.title.align = 'center'
    
    # stems
    p.segment(cats, upper, cats, q3, line_width=2, line_color="black")
    p.segment(cats, lower, cats, q1, line_width=2, line_color="black")

    # boxes
    p.rect(cats, (q3 + q1)/2, 0.5, q3 - q1, fill_color=['#a50f15', '#de2d26', '#fb6a4a', '#fcae91'], 
           alpha=0.7, line_width=2, line_color="black")

    # median (almost-0 height rects simpler than segments)
    p.rect(cats, q2, 0.5, 0.01, line_color="black", line_width=2)

    # whiskers (almost-0 height rects simpler than segments)
    p.rect(cats, lower, 0.2, 0.01, line_color="black")
    p.rect(cats, upper, 0.2, 0.01, line_color="black")

    # outliers
    p.circle(outx, outy, size=6, color="black")
    
    p.add_tools(HoverTool(tooltips=[('Points','@p1')]))

    return p

p = box_plot(df, 'Points1', 'Group', ylabel='Total spread',title='BoxPlot')
show(p)

I am trying to include information for each outlier when hovered over(not in this particular dataset), and each statistical value such as quartiles, median etc., but when i hover it only shows “???”

I am a noob so any guidance/edits would be really helpful

gmerritt123 · May 20, 2022, 11:38am

I haven’t completely gone through your code/tested etc. but it looks to me like you are not using your ColumnDataSource (CDS) to drive the renderers (i.e. your rect and segment glyphs).

The workflow should be to use pandas to get the data into a format where you have all the fields necessary to drive the renderers prior to instantiating the CDS → i.e. do all your quantile calcs etc first to get the necessary args for the segment and rect, along with the attributes you want to show up in the hover, THEN make a CDS of that, and use that CDS to drive the renderers. Finally, I find creating references to the renderers makes hovertool making way easier because you can then control which renderers trigger which hover etc (which becomes a huge thing when your plots get more complex).

Commented example:

import pandas as pd
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.plotting import figure, show

#assemble your data via pandas etc containing exactly the inputs you need to drive the renderers (i.e. do all your preprocessing etc.)
#carry along whatever fields you want to show via hover too within the dataframe
#so for the box/whisker box plot you're making segments (needs xo,x1,yo,y1) and rects (needs x,y,width,height) where xy are centre coords
#one good shortcut for your box plot is that xo, x1, and x will all be the same and we can use one field for all those args

df = pd.DataFrame(data={'x':[1,2,3],'yo':[1,2,0.5],'y1':[3,5,2.5],'yrect':[2,3,1],'w':[0.2,0.2,0.2],'h':[0.5,1,1]
                       ,'thing':['orange','banana','apple'],'color':['orange','yellow','red']})

#Once you get to this point, THEN make your ColumnDataSource
src = ColumnDataSource(df)

p = figure()

#now make your renderers (i.e. rects and segs), pointing to src and the corresponding fields in the src to inform the renderer's required args
#also note how i'm assigning references to these renderers (i.e. seg_rend and rect_rend),
#so we can tell the hovertool to specifically run off them
seg_rend = p.segment(x0='x',x1='x',y0='yo',y1='y1',line_color='black',source=src) #'black' is not a field in src but bokeh will "magically" know we're telling it to make all lines black 
rect_rend = p.rect(x='x',y='yrect',width='w',height='h',fill_color='color',source=src) #bokeh will find the color field to inform the fill color of each rect

#now make the hovertool and tell it to run off these renderers
hvr= HoverTool(renderers=[seg_rend,rect_rend],tooltips=[('Fruit','@thing')])
#add this hovertool to the figure
p.add_tools(hvr)

show(p)

alpha_corporate · May 20, 2022, 2:26pm

so before figure I have specified:

median_source=ColumnDataSource(dict(x=cats,y=q2))

which corresponded with:

p.rect('cats', 'q2', 0.5, 0.01, line_color="black", line_width=2,source=median_source)

but I still get ???, and also the median line has disappeared.

P.S. I did not write this code, I am trying to make additions by included the stat details in the hover tool which wasnt provided

any help please

gmerritt123 · May 20, 2022, 2:42pm

In what you’ve just written , median_source has two fields, called x and y, with x containing the list of cats and y containing the list of q2s.

So you’d have to point to the x and y fields respectively:

p.rect(x='x',y='y',0.5,.01.... source_median_source)

(This is why you’re getting the bad_column_name error)

alpha_corporate · May 20, 2022, 3:19pm

I have done what you have suggested but no success unfortunately

gmerritt123 · May 20, 2022, 3:21pm

First: That is a python generic error, I suggest looking up what that means.

Secondly, you haven’t changed what I suggested.

alpha_corporate · May 20, 2022, 3:24pm

i thought x=‘cats’ and y=‘q2’ in p.rect was what you had suggested, as that is what I have highlighted in the attached image? apologies for any misinterpretation and thank you for your help thus far

gmerritt123 · May 20, 2022, 3:32pm

This has nothing to do with the hovertool anymore:

You create a CDS via a dictionary:

median_source =ColumnDataSource(dict(x=cats,y=q2))

That CDS now has two columns, named x and y respectively because of the keys you’ve assigned in the dictionary. You could call it:

median_source =ColumnDataSource(dict(harry=cats,sally=q2))

and median_source would have two columns named harry and sally repectively.

When you create the rect renderer, you point to the names of the columns for the specific args the renderer needs. In the case of rect, it needs an x and a y arg… so :

p.rect(x='harry',y='sally',0.5,.01.... source_median_source)

Edit: I somewhat recently had some discussion about all the things you use to instantiate a renderer, and the fact that you can mix them ( which you are doing) here → Glyphs with nonexistent column names (non-continuous ranges!) - #4 by gmerritt123 , maybe it will help explain things a bit more.

alpha_corporate · May 20, 2022, 3:55pm

Thank you so much it finally worked! I am very grateful for your help kind sir!

system · August 18, 2022, 3:56pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.