boxplot wrong whiskers

Dear bokeh community!

I have a dataframe in pandas and I want to visualize the data in a boxplot diagram. First I used pandas ‘df.boxplot’ method then I used seaborn package. These results were the same (except coloring and styling). Then I used bokeh and I got different boxes! The wiskers’ values were invalid, non-existing values. Did I do something wrong or is there a bug in bokeh? The datatypes are float.

My data can be download from here: http://www.mavir.hu/web/mavir-en/consumption
Here please choose the “Actual VER data” menu button above the diagram between “Transparency” and “Gross energy” buttons. Then click on “Export to table” to export table and give this date range: “From date: 2009.01.01.”, “To date: 2010.01.01.”, then click on export.

After getting the “export2009-01-01.xls” file, it is availabe to load it into pandas:
df_2009_15min=pd.read_excel(export2009-01-01.xls)

``

Rename the columns:
df_2009_15min.columns=[‘date_time’,‘load_mw’]

``

I created the “a_date” column to store only dates without time (and “a_time” for the time part only). Then I plotted only data before 2009-02-01:
df_2009_15min.query(‘date_time<“2009-02-01”’)[[‘a_date’,‘load_mw’]].boxplot(by=‘a_date’)

``

Then I plotted the same data range in seaborn:
import seaborn as sns
sns.boxplot(x=‘a_date’,y=‘load_mw’,data=df_2009_15min.query(‘date_time<“2009-02-01”’))

``

The results were the same. Then I tried bokeh as well to get the most friendly and handful plot. First of all, I did not find an option to set ‘x’ and ‘y’ axes like in seaborn and no ‘by’ option like in “pandas.boxplot()”. One of these options would be great! So I had to pivot the dataframe (column ‘a_time’ is derived from the original date attribute):
x=df_2009_15min.query(‘date_time<“2009-02-01”’)[[‘a_time’,‘a_date’,‘load_mw’]]
x[‘a_date’]=x[‘a_date’].astype(str).str.replace(’-’,’_’) # without this, bokeh gives error because it converts the column names “2009-01-01” from ‘object’ into ‘datetime’. WHY???* (see below for details)
y=pd.pivot_table(x,index=‘a_time’,columns=‘a_date’,values=‘load_mw’))
boxplot=BoxPlot(y,marker=“circle”,outliers=True,title=“boxplot”,xlabel=“dates”,ylabel=“loads [MW]”)
show(boxplot)

``

  • The error commented out above:
    ValueError: expected an element of either List(String) or List(Int), got [datetime.date(2009, 1, 1), datetime.date(2009, 1, 2)…

But the datatype of the dataframe’s columns is ‘object’:
Index([2009-01-01, 2009-01-02,…], dtype=‘object’, name=‘a_date’)

``

After I replace “-” characters with “_”, the datatype of the dataframe’s columns is ‘object’ to, but bokeh does not convert it into ‘datetime’:
Index([‘2009_01_01’, ‘2009_01_02’,…], dtype=‘object’, name=‘a_date’)

``

Can anybody help me and check this behaviour to find out why bokeh gives invalid boxplots?
For example on 2009-01-12 the boxplot shows a minimum value near 2200 but the minimum value of the whole dataset (for the whole year) is 2615.389. And lots of boxplots maximum value is above 7000 but the maximum value of the whole dataset is 6367.972. How does it possible?

Thanks in advance!

And if I use
from bokeh import mpl
show(mpl.to_bokeh())

``

the boxes are invisible if they are colored, only the whiskers, medians and caps are visible using pandas.boxes() or seaborn plot :frowning:
It would be great to fix it or having a workaround.

Thank you!

  1. augusztus 6., csütörtök 16:08:16 UTC+2 időpontban [email protected] a következőt írta:
···

Dear bokeh community!

I have a dataframe in pandas and I want to visualize the data in a boxplot diagram. First I used pandas ‘df.boxplot’ method then I used seaborn package. These results were the same (except coloring and styling). Then I used bokeh and I got different boxes! The wiskers’ values were invalid, non-existing values. Did I do something wrong or is there a bug in bokeh? The datatypes are float.

My data can be download from here: http://www.mavir.hu/web/mavir-en/consumption
Here please choose the “Actual VER data” menu button above the diagram between “Transparency” and “Gross energy” buttons. Then click on “Export to table” to export table and give this date range: “From date: 2009.01.01.”, “To date: 2010.01.01.”, then click on export.

After getting the “export2009-01-01.xls” file, it is availabe to load it into pandas:
df_2009_15min=pd.read_excel(export2009-01-01.xls)

``

Rename the columns:
df_2009_15min.columns=[‘date_time’,‘load_mw’]

``

I created the “a_date” column to store only dates without time (and “a_time” for the time part only). Then I plotted only data before 2009-02-01:
df_2009_15min.query(‘date_time<“2009-02-01”’)[[‘a_date’,‘load_mw’]].boxplot(by=‘a_date’)

``

Then I plotted the same data range in seaborn:
import seaborn as sns
sns.boxplot(x=‘a_date’,y=‘load_mw’,data=df_2009_15min.query(‘date_time<“2009-02-01”’))

``

The results were the same. Then I tried bokeh as well to get the most friendly and handful plot. First of all, I did not find an option to set ‘x’ and ‘y’ axes like in seaborn and no ‘by’ option like in “pandas.boxplot()”. One of these options would be great! So I had to pivot the dataframe (column ‘a_time’ is derived from the original date attribute):
x=df_2009_15min.query(‘date_time<“2009-02-01”’)[[‘a_time’,‘a_date’,‘load_mw’]]
x[‘a_date’]=x[‘a_date’].astype(str).str.replace(‘-’,‘_’) # without this, bokeh gives error because it converts the column names “2009-01-01” from ‘object’ into ‘datetime’. WHY???* (see below for details)
y=pd.pivot_table(x,index=‘a_time’,columns=‘a_date’,values=‘load_mw’))
boxplot=BoxPlot(y,marker=“circle”,outliers=True,title=“boxplot”,xlabel=“dates”,ylabel=“loads [MW]”)
show(boxplot)

``

  • The error commented out above:
    ValueError: expected an element of either List(String) or List(Int), got [datetime.date(2009, 1, 1), datetime.date(2009, 1, 2)…


But the datatype of the dataframe’s columns is ‘object’:
Index([2009-01-01, 2009-01-02,…], dtype=‘object’, name=‘a_date’)

``

After I replace “-” characters with “_”, the datatype of the dataframe’s columns is ‘object’ to, but bokeh does not convert it into ‘datetime’:
Index([‘2009_01_01’, ‘2009_01_02’,…], dtype=‘object’, name=‘a_date’)

``

Can anybody help me and check this behaviour to find out why bokeh gives invalid boxplots?
For example on 2009-01-12 the boxplot shows a minimum value near 2200 but the minimum value of the whole dataset (for the whole year) is 2615.389. And lots of boxplots maximum value is above 7000 but the maximum value of the whole dataset is 6367.972. How does it possible?

Thanks in advance!