Is new bar api capable to ploting multiple series from multiple frame columns?

Ignas_Brasiskis · November 4, 2015, 9:06pm

Old api had very simple way of ploting multiple values:
http://bokeh.pydata.org/en/0.10.0/docs/gallery/stacked_bar_chart.html
However on new api everything is wrong, I found no way to replicate it.
I tried to replicate it:
from collections import OrderedDict

import pandas as pd

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.olympics2014 import data

df = pd.io.json.json_normalize(data[‘data’])

filter by countries with at least one medal and sort

df = df[df[‘medals.total’] > 0]
df = df.sort(“medals.total”, ascending=False)

get the countries and we group the data by medal type

countries = df.abbr.values.tolist()
gold = df[‘medals.gold’].astype(float).values
silver = df[‘medals.silver’].astype(float).values
bronze = df[‘medals.bronze’].astype(float).values

build a dict containing the grouped data

medals = OrderedDict(countries=countries, bronze=bronze, silver=silver, gold=gold)

any of the following commented are also alid Bar inputs

medals = pd.DataFrame(medals)
#medals = list(medals.values())

output_file(“stacked_bar.html”)

bar = Bar(medals, values=‘gold’, label=“countries”, title=“Stacked bars”)

show(bar)

``

However values keyword argument does not accept multiple columns. That’s stupid.
You can replicate something by duplicating table, puting everything in single column and adding new column with type of medal. And using that column as ‘group’.
But why? Isn’t whole point of bar plot to comprehend data set from multiple columns to groups, without resorting to data move?
Am I missing something?

Bryan · November 5, 2015, 10:04pm

Hi Ignas,

First, please be respectful. Many people have worked very hard on Bokeh to improve it over the last few years, and although there are certainly many places that need more work or improvement, or are still being iterated on, calling things "stupid" does nothing to further useful discussion or collaboration.

The new API is very centered around Pandas DataFrames, and statistical grouping and aggregations on those DataFrames. This kind of API has be long-requested by many people. We are looking at how to make support for "simpler" use-cases better. If you'd like to make an issue to discuss this on GH, it will be much easier to connect with the specific developers that work on the bokeh.charts API. In particular, something that could be extremely helpful would be an example of how you would like to be able to write code to do what you want. That kind of feedback can be extremely productive.

Thanks,

Bryan

···

On Nov 4, 2015, at 3:06 PM, Ignas Brašiškis <[email protected]> wrote:

Old api had very simple way of ploting multiple values:
stacked_bar_chart — Bokeh 0.10.0 documentation
However on new api everything is wrong, I found no way to replicate it.
I tried to replicate it:
from collections import OrderedDict

import pandas as pd

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.olympics2014 import data

df = pd.io.json.json_normalize(data['data'])

# filter by countries with at least one medal and sort
df = df[df['medals.total'] > 0]
df = df.sort("medals.total", ascending=False)

# get the countries and we group the data by medal type
countries = df.abbr.values.tolist()
gold = df['medals.gold'].astype(float).values
silver = df['medals.silver'].astype(float).values
bronze = df['medals.bronze'].astype(float).values

# build a dict containing the grouped data
medals = OrderedDict(countries=countries, bronze=bronze, silver=silver, gold=gold)

# any of the following commented are also alid Bar inputs
medals = pd.DataFrame(medals)
#medals = list(medals.values())

output_file("stacked_bar.html")

bar = Bar(medals, values='gold', label="countries", title="Stacked bars")

show(bar)

However values keyword argument does not accept multiple columns. That's stupid.
You can replicate something by duplicating table, puting everything in single column and adding new column with type of medal. And using that column as 'group'.
But why? Isn't whole point of bar plot to comprehend data set from multiple columns to groups, without resorting to data move?
Am I missing something?

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/4d299cfa-aa96-4d79-959a-7c43e7043e26%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Ignas_Brasiskis · November 6, 2015, 6:37am

Sorry that I sounded rude. Please accept my apologies. Honestly I think that Bokeh is one of the best plotting libraries out here and got overexcited.
I would be glad to open github issue and discuss/contribute something. When you refering to aggregation of dataframes is there any automatic aggregation feature of dataframes column in here? If so it seems that documentation does not refer or hint it. Maybe that can be improved?

Hi Ignas,

First, please be respectful. Many people have worked very hard on Bokeh to improve it over the last few years, and although there are certainly many places that need more work or improvement, or are still being iterated on, calling things “stupid” does nothing to further useful discussion or collaboration.

The new API is very centered around Pandas DataFrames, and statistical grouping and aggregations on those DataFrames. This kind of API has be long-requested by many people. We are looking at how to make support for “simpler” use-cases better. If you’d like to make an issue to discuss this on GH, it will be much easier to connect with the specific developers that work on the bokeh.charts API. In particular, something that could be extremely helpful would be an example of how you would like to be able to write code to do what you want. That kind of feedback can be extremely productive.

Thanks,

Bryan

···

On Nov 4, 2015, at 3:06 PM, Ignas Brašiškis [email protected] wrote:

Old api had very simple way of ploting multiple values:

http://bokeh.pydata.org/en/0.10.0/docs/gallery/stacked_bar_chart.html

However on new api everything is wrong, I found no way to replicate it.

I tried to replicate it:

from collections import OrderedDict

import pandas as pd

from bokeh.charts import Bar, output_file, show

from bokeh.sampledata.olympics2014 import data

df = pd.io.json.json_normalize(data[‘data’])

filter by countries with at least one medal and sort

df = df[df[‘medals.total’] > 0]

df = df.sort(“medals.total”, ascending=False)

get the countries and we group the data by medal type

countries = df.abbr.values.tolist()

gold = df[‘medals.gold’].astype(float).values

silver = df[‘medals.silver’].astype(float).values

bronze = df[‘medals.bronze’].astype(float).values

build a dict containing the grouped data

medals = OrderedDict(countries=countries, bronze=bronze, silver=silver, gold=gold)

any of the following commented are also alid Bar inputs

medals = pd.DataFrame(medals)

#medals = list(medals.values())

output_file(“stacked_bar.html”)

bar = Bar(medals, values=‘gold’, label=“countries”, title=“Stacked bars”)

show(bar)

However values keyword argument does not accept multiple columns. That’s stupid.

You can replicate something by duplicating table, puting everything in single column and adding new column with type of medal. And using that column as ‘group’.

But why? Isn’t whole point of bar plot to comprehend data set from multiple columns to groups, without resorting to data move?

Am I missing something?

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/4d299cfa-aa96-4d79-959a-7c43e7043e26%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/5E8DC231-86F4-434B-97A4-8BF256990F92%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Nick_Roth · November 6, 2015, 7:36am

As Bryan mentioned, supporting the simple input cases is a priority after the rest of the plots are transitioned over. Currently, this plot is possible with the current api, it just requires using some not well advertised features. There are potential shortcuts that could be added so that an individual chart can infer what you mean by when you provide multiple columns as values.

For the “why”, the issue is that you have to consider from a generalized chart standpoint, this data might make sense for this form of a bar chart, but in the real world, most data is not pre-aggregated. It is normalized, stored in databases. Additionally, it can be aggregated into an infinite number of potential formats that involve some combination of tuples, dicts, nested dicts, arrays of dicts, dicts of arrays, arrays of tuples to communicate through a data structure what we want the chart to do. Instead, we should just tell the chart what to do.

So, if you have this normalized data, you must go through some number of manipulations to get it into this pivoted format. Ok, so that chart turned out to not be what I was expecting, so now I need to aggregate it in some other way. Each step, potentially hitting snags where you have to go look at documentation for how to manipulate the data. However, the key topic here isn’t about manipulating data, it is about expressing how you would like the chart to represent your data. So, initially it will be a bit more painful, but in the end it will be more flexible and much more useful for exploratory analysis and interactive plots.

I’d just suggest taking a look at this example file to see the currently better supported use cases: https://github.com/bokeh/bokeh/blob/master/examples/charts/file/bar_multi.py

···

On Wednesday, November 4, 2015 at 3:06:05 PM UTC-6, Ignas Brašiškis wrote:

Old api had very simple way of ploting multiple values:
http://bokeh.pydata.org/en/0.10.0/docs/gallery/stacked_bar_chart.html
However on new api everything is wrong, I found no way to replicate it.
I tried to replicate it:
from collections import OrderedDict

import pandas as pd

from bokeh.charts import Bar, output_file, show
from bokeh.sampledata.olympics2014 import data

df = pd.io.json.json_normalize(data[‘data’])

filter by countries with at least one medal and sort

df = df[df[‘medals.total’] > 0]
df = df.sort(“medals.total”, ascending=False)

get the countries and we group the data by medal type

countries = df.abbr.values.tolist()
gold = df[‘medals.gold’].astype(float).values
silver = df[‘medals.silver’].astype(float).values
bronze = df[‘medals.bronze’].astype(float).values

build a dict containing the grouped data

medals = OrderedDict(countries=countries, bronze=bronze, silver=silver, gold=gold)

any of the following commented are also alid Bar inputs

medals = pd.DataFrame(medals)
#medals = list(medals.values())

output_file(“stacked_bar.html”)

bar = Bar(medals, values=‘gold’, label=“countries”, title=“Stacked bars”)

show(bar)

``

However values keyword argument does not accept multiple columns. That’s stupid.
You can replicate something by duplicating table, puting everything in single column and adding new column with type of medal. And using that column as ‘group’.
But why? Isn’t whole point of bar plot to comprehend data set from multiple columns to groups, without resorting to data move?
Am I missing something?

JMk · November 6, 2015, 2:16pm

Looks awesome. Also would be cool to be able to pass in or register a custom python/ numba function for the aggregation.

Fabio_Pliger · November 6, 2015, 2:31pm

I just wanted to highlight what Bryan and Nick already mentioned. Charts is new and is definitely way more robust/better engineered then the previous versions. Althought it may seem simple, creating a powerful and yet simple interface for charts that covers the plethora of users cases it not a trivial task. That’s why we decided to start from what we considered a standpoint for the complex use cases that we can then develop around to tackle also the simpler/different use cases.

That said we have been constantly discussing/improving the API and, again, most all that have been discussed here is a target for the work in the next weeks. I highly encourage everyone interested to join the discussion/work by opening issues about questions, suggestions or requests (like the ones in this thread), PRs with code contributions or example. It’s very useful for us!

Thanks

Fabio

···

On Fri, Nov 6, 2015 at 8:16 AM, JMk [email protected] wrote:

Looks awesome. Also would be cool to be able to pass in or register a custom python/ numba function for the aggregation.

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/54838e4b-2927-4b74-ac30-095a13b68a29%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

–

Fabio Pliger

Senior Software Engineer, Bokeh