one or many ColumnDataSource s

Samuel_Colvin · June 20, 2014, 2:58pm

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

I know one case is multiple lines on one plot, and the other is multiple plots, but at least in principle it would be possible to use either approach in either situation. Am I missing something?

Mateusz_Paprocki · June 20, 2014, 3:08pm

Hi,

···

On Fri, Jun 20, 2014 at 4:58 PM, Samuel Colvin [email protected] wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

I know one case is multiple lines on one plot, and the other is multiple plots, but at least in principle it would be possible to use either approach in either situation. Am I missing something?

There rule you should follow is that one data source should contain arrays of equal length. bokehjs assumes this to support scalar values, i.e. Circle(x=[1,2,3], y=[1,2,3], radius=0.5) (radius being the interesting part here). So, you will have to use a separate data source for each glyph in most cases.

Mateusz

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/e5542903-0d04-48ac-8fe8-07e046f6f791%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Bryan · June 20, 2014, 3:45pm

As Mateusz mentioned, the implicit assumption on columns in a given data source is that they have the same size. This could probably be stated more often or clearly in the docs. Some context might be helpful as well. The main reason for this is that data sources also store a current selection, and the selection indices can only make sense across different columns if the columns have the same length. This was probably a convenient thing to do early on but I think we may have to revisit this model in the future.

Bryan

···

On Jun 20, 2014, at 9:58 AM, Samuel Colvin <[email protected]> wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

Hugo_Shi1 · June 21, 2014, 8:39am

re: selections - I would love to see us tackle that ASAP, a better selection model would enable much better stuff with serverside data

re: data sources - Is there a reason not to constrain column data sources to having the same column length? Personally i would like to do that, and just throw errors on construction if it's violated

···

On 06/20/2014 11:45 AM, Bryan Van de Ven wrote:

On Jun 20, 2014, at 9:58 AM, Samuel Colvin <[email protected]> wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

As Mateusz mentioned, the implicit assumption on columns in a given data source is that they have the same size. This could probably be stated more often or clearly in the docs. Some context might be helpful as well. The main reason for this is that data sources also store a current selection, and the selection indices can only make sense across different columns if the columns have the same length. This was probably a convenient thing to do early on but I think we may have to revisit this model in the future.

Bryan

Bryan · June 22, 2014, 5:49pm

Hugo, can you write up your thoughts on selections on a GH wiki page or similar when you get a chance? It seems maybe we should make a proper selection object that is decoupled from datasources but I would like to see your thoughts on this.

Regarding hard checks on column data source columns, I think that is fine, or at the very least, loud warnings.

Bryan

···

On Jun 21, 2014, at 3:39 AM, hugo <[email protected]> wrote:

re: selections - I would love to see us tackle that ASAP, a better selection model would enable much better stuff with serverside data

re: data sources - Is there a reason not to constrain column data sources to having the same column length? Personally i would like to do that, and just throw errors on construction if it's violated

On 06/20/2014 11:45 AM, Bryan Van de Ven wrote:

On Jun 20, 2014, at 9:58 AM, Samuel Colvin <[email protected]> wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

As Mateusz mentioned, the implicit assumption on columns in a given data source is that they have the same size. This could probably be stated more often or clearly in the docs. Some context might be helpful as well. The main reason for this is that data sources also store a current selection, and the selection indices can only make sense across different columns if the columns have the same length. This was probably a convenient thing to do early on but I think we may have to revisit this model in the future.

Bryan

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/53A544DE.4030000%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.

Damian_Avila · June 24, 2014, 6:34am

I missed 3 the last hours knocking this implicit assumption about equality of columns lengths. At least, we strongly need to warn about the inequality status…

···

On Sun, Jun 22, 2014 at 2:49 PM, Bryan Van de Ven [email protected] wrote:

Hugo, can you write up your thoughts on selections on a GH wiki page or similar when you get a chance? It seems maybe we should make a proper selection object that is decoupled from datasources but I would like to see your thoughts on this.

Regarding hard checks on column data source columns, I think that is fine, or at the very least, loud warnings.

Bryan

On Jun 21, 2014, at 3:39 AM, hugo [email protected] wrote:

re: selections - I would love to see us tackle that ASAP, a better selection model would enable much better stuff with serverside data

re: data sources - Is there a reason not to constrain column data sources to having the same column length? Personally i would like to do that, and just throw errors on construction if it’s violated

On 06/20/2014 11:45 AM, Bryan Van de Ven wrote:

On Jun 20, 2014, at 9:58 AM, Samuel Colvin [email protected] wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

As Mateusz mentioned, the implicit assumption on columns in a given data source is that they have the same size. This could probably be stated more often or clearly in the docs. Some context might be helpful as well. The main reason for this is that data sources also store a current selection, and the selection indices can only make sense across different columns if the columns have the same length. This was probably a convenient thing to do early on but I think we may have to revisit this model in the future.

Bryan

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/53A544DE.4030000%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/60DEE1AF-EBCB-4187-A961-8312C886D18A%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Damian_Avila · June 24, 2014, 6:36am

the last 3 hours… not “3 the last hours”… jaja… 3:30am here… better going to bed

···

On Tue, Jun 24, 2014 at 3:34 AM, Damian Avila [email protected] wrote:

I missed 3 the last hours knocking this implicit assumption about equality of columns lengths. At least, we strongly need to warn about the inequality status…

On Sun, Jun 22, 2014 at 2:49 PM, Bryan Van de Ven [email protected] wrote:

Hugo, can you write up your thoughts on selections on a GH wiki page or similar when you get a chance? It seems maybe we should make a proper selection object that is decoupled from datasources but I would like to see your thoughts on this.

Regarding hard checks on column data source columns, I think that is fine, or at the very least, loud warnings.

Bryan

On Jun 21, 2014, at 3:39 AM, hugo [email protected] wrote:

re: selections - I would love to see us tackle that ASAP, a better selection model would enable much better stuff with serverside data

re: data sources - Is there a reason not to constrain column data sources to having the same column length? Personally i would like to do that, and just throw errors on construction if it’s violated

On 06/20/2014 11:45 AM, Bryan Van de Ven wrote:

On Jun 20, 2014, at 9:58 AM, Samuel Colvin [email protected] wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

As Mateusz mentioned, the implicit assumption on columns in a given data source is that they have the same size. This could probably be stated more often or clearly in the docs. Some context might be helpful as well. The main reason for this is that data sources also store a current selection, and the selection indices can only make sense across different columns if the columns have the same length. This was probably a convenient thing to do early on but I think we may have to revisit this model in the future.

Bryan

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/53A544DE.4030000%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/60DEE1AF-EBCB-4187-A961-8312C886D18A%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Mateusz_Paprocki · June 24, 2014, 10:16am

Hi,

···

On Tue, Jun 24, 2014 at 8:34 AM, Damian Avila [email protected] wrote:

I missed 3 the last hours knocking this implicit assumption about equality of columns lengths. At least, we strongly need to warn about the inequality status…

We shouldn’t warn. This should be a validation error, otherwise people will simply ignore/overlook this warning and end up in the same place. Optimally, this should be fixed on bokehjs level, so that scalars are expanded on glyph level, not on data source level, or even better not expanded at all, but we should use generators or some other abstraction over arrays and scalars.

Mateusz

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAM9Ly3GgkCOqF%3DV2LkxTFk%3DH_LEt%3Drx7ZyTL%3DWew4diHtdW-wg%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

On Sun, Jun 22, 2014 at 2:49 PM, Bryan Van de Ven [email protected] wrote:

Hugo, can you write up your thoughts on selections on a GH wiki page or similar when you get a chance? It seems maybe we should make a proper selection object that is decoupled from datasources but I would like to see your thoughts on this.

Regarding hard checks on column data source columns, I think that is fine, or at the very least, loud warnings.

Bryan

On Jun 21, 2014, at 3:39 AM, hugo [email protected] wrote:

re: selections - I would love to see us tackle that ASAP, a better selection model would enable much better stuff with serverside data

re: data sources - Is there a reason not to constrain column data sources to having the same column length? Personally i would like to do that, and just throw errors on construction if it’s violated

On 06/20/2014 11:45 AM, Bryan Van de Ven wrote:

On Jun 20, 2014, at 9:58 AM, Samuel Colvin [email protected] wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

As Mateusz mentioned, the implicit assumption on columns in a given data source is that they have the same size. This could probably be stated more often or clearly in the docs. Some context might be helpful as well. The main reason for this is that data sources also store a current selection, and the selection indices can only make sense across different columns if the columns have the same length. This was probably a convenient thing to do early on but I think we may have to revisit this model in the future.

Bryan

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/53A544DE.4030000%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/60DEE1AF-EBCB-4187-A961-8312C886D18A%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Bryan · June 24, 2014, 12:17pm

Yes I think we should make optimizing scalers a task for 0.6. I think currently we should error in python and console log in bokehjs.

···

On Tue, Jun 24, 2014 at 8:34 AM, Damian Avila [email protected] wrote:

I missed 3 the last hours knocking this implicit assumption about equality of columns lengths. At least, we strongly need to warn about the inequality status…

We shouldn’t warn. This should be a validation error, otherwise people will simply ignore/overlook this warning and end up in the same place. Optimally, this should be fixed on bokehjs level, so that scalars are expanded on glyph level, not on data source level, or even better not expanded at all, but we should use generators or some other abstraction over arrays and scalars.

Mateusz

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAM9Ly3GgkCOqF%3DV2LkxTFk%3DH_LEt%3Drx7ZyTL%3DWew4diHtdW-wg%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

On Sun, Jun 22, 2014 at 2:49 PM, Bryan Van de Ven [email protected] wrote:

Hugo, can you write up your thoughts on selections on a GH wiki page or similar when you get a chance? It seems maybe we should make a proper selection object that is decoupled from datasources but I would like to see your thoughts on this.

Regarding hard checks on column data source columns, I think that is fine, or at the very least, loud warnings.

Bryan

On Jun 21, 2014, at 3:39 AM, hugo [email protected] wrote:

re: selections - I would love to see us tackle that ASAP, a better selection model would enable much better stuff with serverside data

re: data sources - Is there a reason not to constrain column data sources to having the same column length? Personally i would like to do that, and just throw errors on construction if it’s violated

On 06/20/2014 11:45 AM, Bryan Van de Ven wrote:

On Jun 20, 2014, at 9:58 AM, Samuel Colvin [email protected] wrote:

When declaring multiple sets of data (eg. multiple x=[0,1,2,3], y=[0,1,4,9] pairs) in JSON it seems there are two possible approaches.

In one (eg. anscombe.py) you declare one ColumnDataSource which contains all data arrays.

In the other (eg. stocks.py) you declare a ColumnDataSource for each x,y pair.

Which is the preferred approach? And most importantly are there situations where only one approach is valid?

As Mateusz mentioned, the implicit assumption on columns in a given data source is that they have the same size. This could probably be stated more often or clearly in the docs. Some context might be helpful as well. The main reason for this is that data sources also store a current selection, and the selection indices can only make sense across different columns if the columns have the same length. This was probably a convenient thing to do early on but I think we may have to revisit this model in the future.

Bryan

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/53A544DE.4030000%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

–

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/60DEE1AF-EBCB-4187-A961-8312C886D18A%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.