Using pandas DataFrame as a ColumnDataSource

I’ve seen examples where you can use your dataframes directly to provide data. I’ve not gotten this to work well because:

  1. I don’t know how to extract the index by ‘name’. There are no examples of this I’ve found.

  2. Attempts to get a column I know is in the dataframe fails with:

check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo

I print the columns right before I try to make the line call:

source = ColumnDataSource(df)

columns = list(df)

print(‘Columns: %s’ % columns)

for col in columns:

p.line(x=range(len(df)), y=col, line_width=2)

Is this the right to go about this?

Thanks,

-Clint

Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don’t know how to refer to a df index):

Supplying a user-defined data source AND iterable values to glyph methods is

not possibe. Either:

Pass all data directly as literals:

p.circe(x=a_list, y=an_array, …)

Or, put all data in a ColumnDataSource and pass column names:

source = ColumnDataSource(data=dict(x=a_list, y=an_array))

p.circe(x=‘x’, y=‘x’, source=source, …)

Thanks,

-Clint

···

On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:

I’ve seen examples where you can use your dataframes directly to provide data. I’ve not gotten this to work well because:

  1. I don’t know how to extract the index by ‘name’. There are no examples of this I’ve found.
  1. Attempts to get a column I know is in the dataframe fails with:

check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo

I print the columns right before I try to make the line call:

source = ColumnDataSource(df)

columns = list(df)

print(‘Columns: %s’ % columns)

for col in columns:

p.line(x=range(len(df)), y=col, line_width=2)

Is this the right to go about this?

Thanks,

-Clint

Hi,

If the Pandas index is named, that name is used as the column name. Otherwise, 'index' is used as the column name. [1] CDS has methods to let you inspect the column names, or you can always just look at .data, too.

Thanks,

Bryan

[1] Unless there is another non-index column named 'index' but I would advise trying to avoid that.

···

On Jul 25, 2018, at 16:46, Clint Olsen <[email protected]> wrote:

Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don't know how to refer to a df index):

Supplying a user-defined data source AND iterable values to glyph methods is
not possibe. Either:

Pass all data directly as literals:

    p.circe(x=a_list, y=an_array, ...)

Or, put all data in a ColumnDataSource and pass column names:

    source = ColumnDataSource(data=dict(x=a_list, y=an_array))
    p.circe(x='x', y='x', source=source, ...)

Thanks,

-Clint

On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:
I've seen examples where you can use your dataframes directly to provide data. I've not gotten this to work well because:

1) I don't know how to extract the index by 'name'. There are no examples of this I've found.

2) Attempts to get a column I know is in the dataframe fails with:
    check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo

    I print the columns right before I try to make the line call:

    source = ColumnDataSource(df)

    columns = list(df)
    print('Columns: %s' % columns)

    for col in columns:
        p.line(x=range(len(df)), y=col, line_width=2)

    Is this the right to go about this?

Thanks,

-Clint

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/78588684-6977-4d27-bd5c-75419eab9b5f%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

After some experimentation, my dataframe index is unnamed (None) and yet ‘index’ isn’t accepted by the line() function.

Thanks,

-Clint

···

On Wednesday, July 25, 2018 at 5:01:50 PM UTC-7, Bryan Van de ven wrote:

Hi,

If the Pandas index is named, that name is used as the column name. Otherwise, ‘index’ is used as the column name. [1] CDS has methods to let you inspect the column names, or you can always just look at .data, too.

Thanks,

Bryan

[1] Unless there is another non-index column named ‘index’ but I would advise trying to avoid that.

On Jul 25, 2018, at 16:46, Clint Olsen [email protected] wrote:

Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don’t know how to refer to a df index):

Supplying a user-defined data source AND iterable values to glyph methods is

not possibe. Either:

Pass all data directly as literals:

p.circe(x=a_list, y=an_array, ...)

Or, put all data in a ColumnDataSource and pass column names:

source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circe(x='x', y='x', source=source, ...)

Thanks,

-Clint

On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:

I’ve seen examples where you can use your dataframes directly to provide data. I’ve not gotten this to work well because:

  1. I don’t know how to extract the index by ‘name’. There are no examples of this I’ve found.
  1. Attempts to get a column I know is in the dataframe fails with:
check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo
I print the columns right before I try to make the line call:
source = ColumnDataSource(df)
columns = list(df)
print('Columns: %s' % columns)
for col in columns:
    p.line(x=range(len(df)), y=col, line_width=2)
Is this the right to go about this?

Thanks,

-Clint


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/78588684-6977-4d27-bd5c-75419eab9b5f%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi,

Please post a complete minimal script somewhere that represents what you are trying to do, there is too much speculation otherwise.

Thanks,

Bryan

···

On Jul 25, 2018, at 21:14, Clint Olsen <[email protected]> wrote:

I was able to get line() to accept:

p.line(x=df.index.name, y=col, line_width=2, source=source)

However, the graph it produces is very weird (and so are indices on the x-axis). I'm guessing that might have to do with me faking categorical data as you mentioned before. I'm not sure what it's doing yet.

Thanks,

-Clint

On Wednesday, July 25, 2018 at 5:01:50 PM UTC-7, Bryan Van de ven wrote:
Hi,

If the Pandas index is named, that name is used as the column name. Otherwise, 'index' is used as the column name. [1] CDS has methods to let you inspect the column names, or you can always just look at .data, too.

Thanks,

Bryan

[1] Unless there is another non-index column named 'index' but I would advise trying to avoid that.

> On Jul 25, 2018, at 16:46, Clint Olsen <[email protected]> wrote:
>
> Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don't know how to refer to a df index):
>
> Supplying a user-defined data source AND iterable values to glyph methods is
> not possibe. Either:
>
> Pass all data directly as literals:
>
> p.circe(x=a_list, y=an_array, ...)
>
> Or, put all data in a ColumnDataSource and pass column names:
>
> source = ColumnDataSource(data=dict(x=a_list, y=an_array))
> p.circe(x='x', y='x', source=source, ...)
>
> Thanks,
>
> -Clint
>
> On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:
> I've seen examples where you can use your dataframes directly to provide data. I've not gotten this to work well because:
>
> 1) I don't know how to extract the index by 'name'. There are no examples of this I've found.
>
> 2) Attempts to get a column I know is in the dataframe fails with:
> check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo
>
> I print the columns right before I try to make the line call:
>
> source = ColumnDataSource(df)
>
> columns = list(df)
> print('Columns: %s' % columns)
>
> for col in columns:
> p.line(x=range(len(df)), y=col, line_width=2)
>
> Is this the right to go about this?
>
> Thanks,
>
> -Clint
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/78588684-6977-4d27-bd5c-75419eab9b5f%40continuum.io.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/2b28884c-00cd-433e-aee3-5b2b47137ed5%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi:

This should be about right.

import pandas as pd

from bokeh.plotting import figure, output_file, save, ColumnDataSource

data = {‘name’: [‘a’, ‘b’, ‘c’, ‘d’, ‘e’], ‘attr’: [‘g’, ‘h’, ‘i’, ‘j’, ‘k’], ‘val’: [0.25, 0.33, 1.0, 1.25, 2]}

df = pd.DataFrame(data=data)

pv = pd.pivot_table(df, index=[‘name’, ‘attr’], values=‘val’)

p = figure()

source = ColumnDataSource(pv)

#p.line(x=None, y=‘val’, source=source)

p.line(x=‘index’, y=‘val’, source=source)

output_file(‘foo.html’)

save§

E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: index [renderer: GlyphRenderer(id=‘7cbabf41-5ff7-496c-a682-442db5f0e3e1’, …)]

···

On Wednesday, July 25, 2018 at 9:29:25 PM UTC-7, Bryan Van de ven wrote:

Hi,

Please post a complete minimal script somewhere that represents what you are trying to do, there is too much speculation otherwise.

Thanks,

Bryan

On Jul 25, 2018, at 21:14, Clint Olsen [email protected] wrote:

I was able to get line() to accept:

p.line(x=df.index.name, y=col, line_width=2, source=source)

However, the graph it produces is very weird (and so are indices on the x-axis). I’m guessing that might have to do with me faking categorical data as you mentioned before. I’m not sure what it’s doing yet.

Thanks,

-Clint

On Wednesday, July 25, 2018 at 5:01:50 PM UTC-7, Bryan Van de ven wrote:

Hi,

If the Pandas index is named, that name is used as the column name. Otherwise, ‘index’ is used as the column name. [1] CDS has methods to let you inspect the column names, or you can always just look at .data, too.

Thanks,

Bryan

[1] Unless there is another non-index column named ‘index’ but I would advise trying to avoid that.

On Jul 25, 2018, at 16:46, Clint Olsen [email protected] wrote:

Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don’t know how to refer to a df index):

Supplying a user-defined data source AND iterable values to glyph methods is
not possibe. Either:

Pass all data directly as literals:

p.circe(x=a_list, y=an_array, ...)

Or, put all data in a ColumnDataSource and pass column names:

source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circe(x='x', y='x', source=source, ...)

Thanks,

-Clint

On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:
I’ve seen examples where you can use your dataframes directly to provide data. I’ve not gotten this to work well because:

  1. I don’t know how to extract the index by ‘name’. There are no examples of this I’ve found.

  2. Attempts to get a column I know is in the dataframe fails with:
    check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo

    I print the columns right before I try to make the line call:

    source = ColumnDataSource(df)

    columns = list(df)
    print(‘Columns: %s’ % columns)

    for col in columns:
    p.line(x=range(len(df)), y=col, line_width=2)

    Is this the right to go about this?

Thanks,

-Clint


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/78588684-6977-4d27-bd5c-75419eab9b5f%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/2b28884c-00cd-433e-aee3-5b2b47137ed5%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

So, is this testcase not sufficient to demonstrate a problem or am I just off in the weeds?

Thanks,

-Clint

···

On Thursday, July 26, 2018 at 3:54:19 PM UTC-7, Clint Olsen wrote:

Hi:

This should be about right.

import pandas as pd

from bokeh.plotting import figure, output_file, save, ColumnDataSource

data = {‘name’: [‘a’, ‘b’, ‘c’, ‘d’, ‘e’], ‘attr’: [‘g’, ‘h’, ‘i’, ‘j’, ‘k’], ‘val’: [0.25, 0.33, 1.0, 1.25, 2]}

df = pd.DataFrame(data=data)

pv = pd.pivot_table(df, index=[‘name’, ‘attr’], values=‘val’)

p = figure()

source = ColumnDataSource(pv)

#p.line(x=None, y=‘val’, source=source)

p.line(x=‘index’, y=‘val’, source=source)

output_file(‘foo.html’)

save§

E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: index [renderer: GlyphRenderer(id=‘7cbabf41-5ff7-496c-a682-442db5f0e3e1’, …)]

On Wednesday, July 25, 2018 at 9:29:25 PM UTC-7, Bryan Van de ven wrote:

Hi,

Please post a complete minimal script somewhere that represents what you are trying to do, there is too much speculation otherwise.

Thanks,

Bryan

On Jul 25, 2018, at 21:14, Clint Olsen [email protected] wrote:

I was able to get line() to accept:

p.line(x=df.index.name, y=col, line_width=2, source=source)

However, the graph it produces is very weird (and so are indices on the x-axis). I’m guessing that might have to do with me faking categorical data as you mentioned before. I’m not sure what it’s doing yet.

Thanks,

-Clint

On Wednesday, July 25, 2018 at 5:01:50 PM UTC-7, Bryan Van de ven wrote:

Hi,

If the Pandas index is named, that name is used as the column name. Otherwise, ‘index’ is used as the column name. [1] CDS has methods to let you inspect the column names, or you can always just look at .data, too.

Thanks,

Bryan

[1] Unless there is another non-index column named ‘index’ but I would advise trying to avoid that.

On Jul 25, 2018, at 16:46, Clint Olsen [email protected] wrote:

Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don’t know how to refer to a df index):

Supplying a user-defined data source AND iterable values to glyph methods is
not possibe. Either:

Pass all data directly as literals:

p.circe(x=a_list, y=an_array, ...)

Or, put all data in a ColumnDataSource and pass column names:

source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circe(x='x', y='x', source=source, ...)

Thanks,

-Clint

On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:
I’ve seen examples where you can use your dataframes directly to provide data. I’ve not gotten this to work well because:

  1. I don’t know how to extract the index by ‘name’. There are no examples of this I’ve found.

  2. Attempts to get a column I know is in the dataframe fails with:
    check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo

    I print the columns right before I try to make the line call:

    source = ColumnDataSource(df)

    columns = list(df)
    print(‘Columns: %s’ % columns)

    for col in columns:
    p.line(x=range(len(df)), y=col, line_width=2)

    Is this the right to go about this?

Thanks,

-Clint


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/78588684-6977-4d27-bd5c-75419eab9b5f%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/2b28884c-00cd-433e-aee3-5b2b47137ed5%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi,

There's no need for this kind of prodding. The last post was right before the weekend, and believe it or not, OSS maintainers do sometimes need to take actual time off from doing unpaid work to answer support questions in order to avoid burn out.

Regarding the question: It's a multi-index, so the column name is the concatenation of the index column names, which can be seen immediately by inspecting the contents of the CDS directly:

  In [13]: pv = pd.pivot_table(df, index=['name', 'attr'], values='val') # index=['name', 'attr'] !

  In [14]: source = ColumnDataSource(pv)

  In [15]: source.data

  Out[15]:
  {'val': array([0.25, 0.33, 1. , 1.25, 2. ]),
   'name_attr': array([('a', 'g'), ('b', 'h'), ('c', 'i'), ('d', 'j'), ('e', 'k')],
         dtype=object)}

This open PR:

  https://github.com/bokeh/bokeh/pull/8093

adds more docs regarding this specific case.

For a multi-index it will also be necessary to set up an appropriate categorical range for nested categories:

  https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html#nested-categories

Bryan

···

On Jul 30, 2018, at 21:16, Clint Olsen <[email protected]> wrote:

So, is this testcase not sufficient to demonstrate a problem or am I just off in the weeds?

Thanks,

-Clint

On Thursday, July 26, 2018 at 3:54:19 PM UTC-7, Clint Olsen wrote:
Hi:

This should be about right.

import pandas as pd
from bokeh.plotting import figure, output_file, save, ColumnDataSource

data = {'name': ['a', 'b', 'c', 'd', 'e'], 'attr': ['g', 'h', 'i', 'j', 'k'], 'val': [0.25, 0.33, 1.0, 1.25, 2]}

df = pd.DataFrame(data=data)

pv = pd.pivot_table(df, index=['name', 'attr'], values='val')

p = figure()

source = ColumnDataSource(pv)

#p.line(x=None, y='val', source=source)
p.line(x='index', y='val', source=source)
output_file('foo.html')
save(p)

...

E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: index [renderer: GlyphRenderer(id='7cbabf41-5ff7-496c-a682-442db5f0e3e1', ...)]

On Wednesday, July 25, 2018 at 9:29:25 PM UTC-7, Bryan Van de ven wrote:
Hi,

Please post a complete minimal script somewhere that represents what you are trying to do, there is too much speculation otherwise.

Thanks,

Bryan

> On Jul 25, 2018, at 21:14, Clint Olsen <[email protected]> wrote:
>
> I was able to get line() to accept:
>
> p.line(x=df.index.name, y=col, line_width=2, source=source)
>
> However, the graph it produces is very weird (and so are indices on the x-axis). I'm guessing that might have to do with me faking categorical data as you mentioned before. I'm not sure what it's doing yet.
>
> Thanks,
>
> -Clint
>
> On Wednesday, July 25, 2018 at 5:01:50 PM UTC-7, Bryan Van de ven wrote:
> Hi,
>
> If the Pandas index is named, that name is used as the column name. Otherwise, 'index' is used as the column name. [1] CDS has methods to let you inspect the column names, or you can always just look at .data, too.
>
> Thanks,
>
> Bryan
>
> [1] Unless there is another non-index column named 'index' but I would advise trying to avoid that.
>
> > On Jul 25, 2018, at 16:46, Clint Olsen <[email protected]> wrote:
> >
> > Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don't know how to refer to a df index):
> >
> > Supplying a user-defined data source AND iterable values to glyph methods is
> > not possibe. Either:
> >
> > Pass all data directly as literals:
> >
> > p.circe(x=a_list, y=an_array, ...)
> >
> > Or, put all data in a ColumnDataSource and pass column names:
> >
> > source = ColumnDataSource(data=dict(x=a_list, y=an_array))
> > p.circe(x='x', y='x', source=source, ...)
> >
> > Thanks,
> >
> > -Clint
> >
> > On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:
> > I've seen examples where you can use your dataframes directly to provide data. I've not gotten this to work well because:
> >
> > 1) I don't know how to extract the index by 'name'. There are no examples of this I've found.
> >
> > 2) Attempts to get a column I know is in the dataframe fails with:
> > check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo
> >
> > I print the columns right before I try to make the line call:
> >
> > source = ColumnDataSource(df)
> >
> > columns = list(df)
> > print('Columns: %s' % columns)
> >
> > for col in columns:
> > p.line(x=range(len(df)), y=col, line_width=2)
> >
> > Is this the right to go about this?
> >
> > Thanks,
> >
> > -Clint
> >
> > --
> > You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> > To post to this group, send email to [email protected].
> > To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/78588684-6977-4d27-bd5c-75419eab9b5f%40continuum.io.
> > For more options, visit https://groups.google.com/a/continuum.io/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/2b28884c-00cd-433e-aee3-5b2b47137ed5%40continuum.io.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d6dcff09-79c5-4e0f-9fea-b12fbfcc6e7e%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Thanks for letting me know how to specify a multiindex by name (elements separated by ‘_’). I tried reading the link regarding specifying categorical ranges, and this is so confusing I don’t know how to give it what it needs to not produce a mangled result.

Originally when I provided the x,y data directly I just assigned a numerical range p.line(x=range(len(df))…) and this along with a p.xaxis.major_label_overrides parameter to label the x-axis the way I wanted (not an integer) worked fine.

-Clint

···

On Monday, July 30, 2018 at 11:02:43 PM UTC-7, Bryan Van de ven wrote:

Hi,

There’s no need for this kind of prodding. The last post was right before the weekend, and believe it or not, OSS maintainers do sometimes need to take actual time off from doing unpaid work to answer support questions in order to avoid burn out.

Regarding the question: It’s a multi-index, so the column name is the concatenation of the index column names, which can be seen immediately by inspecting the contents of the CDS directly:

    In [13]: pv = pd.pivot_table(df, index=['name', 'attr'], values='val') # index=['name', 'attr'] !



    In [14]: source = ColumnDataSource(pv)


    In [15]: source.data


    Out[15]:
    {'val': array([0.25, 0.33, 1.  , 1.25, 2.  ]),

     'name_attr': array([('a', 'g'), ('b', 'h'), ('c', 'i'), ('d', 'j'), ('e', 'k')],

           dtype=object)}

This open PR:

    [https://github.com/bokeh/bokeh/pull/8093](https://github.com/bokeh/bokeh/pull/8093)

adds more docs regarding this specific case.

For a multi-index it will also be necessary to set up an appropriate categorical range for nested categories:

    [https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html#nested-categories](https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html#nested-categories)

Bryan

On Jul 30, 2018, at 21:16, Clint Olsen [email protected] wrote:

So, is this testcase not sufficient to demonstrate a problem or am I just off in the weeds?

Thanks,

-Clint

On Thursday, July 26, 2018 at 3:54:19 PM UTC-7, Clint Olsen wrote:

Hi:

This should be about right.

import pandas as pd

from bokeh.plotting import figure, output_file, save, ColumnDataSource

data = {‘name’: [‘a’, ‘b’, ‘c’, ‘d’, ‘e’], ‘attr’: [‘g’, ‘h’, ‘i’, ‘j’, ‘k’], ‘val’: [0.25, 0.33, 1.0, 1.25, 2]}

df = pd.DataFrame(data=data)

pv = pd.pivot_table(df, index=[‘name’, ‘attr’], values=‘val’)

p = figure()

source = ColumnDataSource(pv)

#p.line(x=None, y=‘val’, source=source)

p.line(x=‘index’, y=‘val’, source=source)

output_file(‘foo.html’)

save§

E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: index [renderer: GlyphRenderer(id=‘7cbabf41-5ff7-496c-a682-442db5f0e3e1’, …)]

On Wednesday, July 25, 2018 at 9:29:25 PM UTC-7, Bryan Van de ven wrote:

Hi,

Please post a complete minimal script somewhere that represents what you are trying to do, there is too much speculation otherwise.

Thanks,

Bryan

On Jul 25, 2018, at 21:14, Clint Olsen [email protected] wrote:

I was able to get line() to accept:

p.line(x=df.index.name, y=col, line_width=2, source=source)

However, the graph it produces is very weird (and so are indices on the x-axis). I’m guessing that might have to do with me faking categorical data as you mentioned before. I’m not sure what it’s doing yet.

Thanks,

-Clint

On Wednesday, July 25, 2018 at 5:01:50 PM UTC-7, Bryan Van de ven wrote:
Hi,

If the Pandas index is named, that name is used as the column name. Otherwise, ‘index’ is used as the column name. [1] CDS has methods to let you inspect the column names, or you can always just look at .data, too.

Thanks,

Bryan

[1] Unless there is another non-index column named ‘index’ but I would advise trying to avoid that.

On Jul 25, 2018, at 16:46, Clint Olsen [email protected] wrote:

Sorry, my mistake. I neglected to add the source=source parameter. This gets me back to issue 1 (don’t know how to refer to a df index):

Supplying a user-defined data source AND iterable values to glyph methods is
not possibe. Either:

Pass all data directly as literals:

p.circe(x=a_list, y=an_array, ...)

Or, put all data in a ColumnDataSource and pass column names:

source = ColumnDataSource(data=dict(x=a_list, y=an_array))
p.circe(x='x', y='x', source=source, ...)

Thanks,

-Clint

On Wednesday, July 25, 2018 at 4:18:15 PM UTC-7, Clint Olsen wrote:
I’ve seen examples where you can use your dataframes directly to provide data. I’ve not gotten this to work well because:

  1. I don’t know how to extract the index by ‘name’. There are no examples of this I’ve found.

  2. Attempts to get a column I know is in the dataframe fails with:
    check.py: ERROR: E-1001 (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: foo

    I print the columns right before I try to make the line call:

    source = ColumnDataSource(df)

    columns = list(df)
    print(‘Columns: %s’ % columns)

    for col in columns:
    p.line(x=range(len(df)), y=col, line_width=2)

    Is this the right to go about this?

Thanks,

-Clint


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/78588684-6977-4d27-bd5c-75419eab9b5f%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/2b28884c-00cd-433e-aee3-5b2b47137ed5%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d6dcff09-79c5-4e0f-9fea-b12fbfcc6e7e%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.