Column oriented input data for high level Charts

Nick_Roth · April 16, 2015, 3:17pm

There was some previous discussion about what kind and what level of pre-processed inputs should be accepted by the Charts. I glanced through and didn’t find a conclusion on it, so I wanted to ask. I think it comes down to, will Charts support a pre-formatted data and ggplot-like interface, or will it stay a bit lower level for pre-formatted only?

Looking at the Bar chart builder, it seems to require the pre-grouping of data. This tends to lean more towards flexibility from the standpoint of the chart, but away from quick plots from dataframe-like structures.I have been looking to spend a little more time on the general faceting functionality and integrating some Charts into crossfilter, so was just curious where the interface is headed.

I see some reference to what I was thinking here (maybe we need the same kind of specification for other chart types):

Scatter Charts in BEP 3

but see some integration with the pandas groupby here:

Scatter Builder adapt values

IMO, it seems like the data adapter should operate kind of like AES in ggplot, in that it adapts column oriented data and an aesthetic specification into the data structure that the chart is looking for. Otherwise, if the data is already in the format it can directly plot, it doesn’t do anything. Having each chart implement it’s own handling of any data grouping seems like it would get a bit messy. I think this is where I started looking into generic grouping capabilities that could be used by the data adapter or column data source, then considered blaze as the internal engine for adapting/grouping data.

Fabio_Pliger · April 16, 2015, 3:51pm

Hi Nick,

This is a perfect timing as we have been refining some improvements to add/shape into the Charts API. Here’s the PR where this work and discussions are evolving.

I’d be great if you can get you into the discussions and give your feedback.

There was some previous discussion about what kind and what level of pre-processed inputs should be accepted by the Charts. I glanced through and didn’t find a conclusion on it, so I wanted to ask. I think it comes down to, will Charts support a pre-formatted data and ggplot-like interface, or will it stay a bit lower level for pre-formatted only?

Looking at the Bar chart builder, it seems to require the pre-grouping of data. This tends to lean more towards flexibility from the standpoint of the chart, but away from quick plots from dataframe-like structures.I have been looking to spend a little more time on the general faceting functionality and integrating some Charts into crossfilter, so was just curious where the interface is headed.

Ah, that’s again a perfect timing. I’ve spent some time last week exploring Crossfilter in order to integrate charts. Everything is very related to the way Charts evolve. Here’s briefly what I think:

Charts need to change in the direction of the PR linked above. This will let users just “pass in” their data (DataFrame-like structures) that get “injected” into their DataSource. Only charts that need grouped or computed data add their specific data that (that data is specific tied to it’s charts). Please not that this PR also let’s you pass ColumnDataSources as inputs of chart, selecting only the keys you want to use on x/y. This opens a lot of interesting scenarios. You can see a few on the example I’ve added on this PR.
Charts should drop DataAdapter in favor of pd.DataFrame as bas structure

I see some reference to what I was thinking here (maybe we need the same kind of specification for other chart types):

Scatter Charts in BEP 3

but see some integration with the pandas groupby here:

Scatter Builder adapt values

IMO, it seems like the data adapter should operate kind of like AES in ggplot, in that it adapts column oriented data and an aesthetic specification into the data structure that the chart is looking for. Otherwise, if the data is already in the format it can directly plot, it doesn’t do anything. Having each chart implement it’s own handling of any data grouping seems like it would get a bit messy. I think this is where I started looking into generic grouping capabilities that could be used by the data adapter or column data source, then considered blaze as the internal engine for adapting/grouping data.

I’d love to follow up this discussion on the context of this charts PR. It’s very important. In general I think charts should be as flexible as possible and do the minimum necessary to consume the data passed by the user (that don’t need to filter it upfront but, instead, say to charts what to consider and how).

Please join the conversation on GH and give your feedback. I’d also really like to talk about the integration with CrossFilter, what I’ve been doing and my opinion on how it could evolve to integrate Charts (and more).

Thanks!

Fabio

···

On Thursday, April 16, 2015 at 5:17:00 PM UTC+2, Nick Roth wrote: