Unused datetime fields cause errors

I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?

Here is a simple example. Attached is the jupyter notebook I used.

import pandas as pd

pd.options.display.max_rows = 10

from bokeh.plotting import figure, output_notebook, output_file, show

output_file(‘broken.html’)

some fake data

data = pd.DataFrame({‘time’:[‘09:30’,‘10:00’,‘10:05’,‘10:30’], ‘prc’:[5,5.25,6,7]})

data.time = pd.to_datetime(data.time)

data[‘time_not_used’] = data.time # causes no problems

create an column we will not use, but set a value to NaT

data[‘time_not_used’] = data.time.where(data.time!=‘2017-02-13 10:05:00’)

print(data.info())

print(data)

p = figure(x_axis_type=‘datetime’)

p.circle(x=‘time’, y=‘prc’, source=data, color=‘red’, size=10)

show(p) # throws “ValueError: NaTType does not support timetuple”

``

ErrorExample.ipynb (27 KB)

Hi,

Bokeh has no way of knowing whether a column is truly "used" or no. E.g. maybe you refer to an "unused" column in some CustomJS callback, Bokeh has no way of determining this, so has to act conservatively. Accordingly, anything that goes into a CDS is "used", in the sense that it is serialized and included in the document. The corollary is that every type in a CDS has to be serializable. I'm guessing Bokeh does not know what to do with NAT types. There's certainly no corresponding type on the JS side. Maybe they can just be converted to nulls, but it will require discussion and/or investigation, so I'd suggest a GitHub issue.

Thanks,

Bryan

···

On Feb 13, 2017, at 08:19, John Marino <[email protected]> wrote:

I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?

Here is a simple example. Attached is the jupyter notebook I used.

import pandas as pd
pd.options.display.max_rows = 10

from bokeh.plotting import figure, output_notebook, output_file, show
output_file('broken.html')

# some fake data
data = pd.DataFrame({'time':['09:30','10:00','10:05','10:30'], 'prc':[5,5.25,6,7]})
data.time = pd.to_datetime(data.time)
# data['time_not_used'] = data.time # causes no problems

# create an column we will not use, but set a value to NaT
data['time_not_used'] = data.time.where(data.time!='2017-02-13 10:05:00')

print(data.info())

print(data)

p = figure(x_axis_type='datetime')
p.circle(x='time', y='prc', source=data, color='red', size=10)
show(p) # throws "ValueError: NaTType does not support timetuple"

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/f462c374-0fae-45e2-adc6-616e34a2eb1e%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
<ErrorExample.ipynb>

I should add, as an immediate workaround, things should work if you delete such unused columns from the CDS .data dict before generating output.

Bryan

···

On Feb 13, 2017, at 09:08, Bryan Van de ven <[email protected]> wrote:

Hi,

Bokeh has no way of knowing whether a column is truly "used" or no. E.g. maybe you refer to an "unused" column in some CustomJS callback, Bokeh has no way of determining this, so has to act conservatively. Accordingly, anything that goes into a CDS is "used", in the sense that it is serialized and included in the document. The corollary is that every type in a CDS has to be serializable. I'm guessing Bokeh does not know what to do with NAT types. There's certainly no corresponding type on the JS side. Maybe they can just be converted to nulls, but it will require discussion and/or investigation, so I'd suggest a GitHub issue.

Thanks,

Bryan

On Feb 13, 2017, at 08:19, John Marino <[email protected]> wrote:

I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?

Here is a simple example. Attached is the jupyter notebook I used.

import pandas as pd
pd.options.display.max_rows = 10

from bokeh.plotting import figure, output_notebook, output_file, show
output_file('broken.html')

# some fake data
data = pd.DataFrame({'time':['09:30','10:00','10:05','10:30'], 'prc':[5,5.25,6,7]})
data.time = pd.to_datetime(data.time)
# data['time_not_used'] = data.time # causes no problems

# create an column we will not use, but set a value to NaT
data['time_not_used'] = data.time.where(data.time!='2017-02-13 10:05:00')

print(data.info())

print(data)

p = figure(x_axis_type='datetime')
p.circle(x='time', y='prc', source=data, color='red', size=10)
show(p) # throws "ValueError: NaTType does not support timetuple"

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/f462c374-0fae-45e2-adc6-616e34a2eb1e%40continuum.io\.
For more options, visit https://groups.google.com/a/continuum.io/d/optout\.
<ErrorExample.ipynb>

The other workaround is to avoid using the “source” param and only pass in what you need.
p.circle(x=data.time, y=data.prc, color=‘red’, size=10)

``

– John

···

On Monday, February 13, 2017 at 10:10:09 AM UTC-5, Bryan Van de ven wrote:

I should add, as an immediate workaround, things should work if you delete such unused columns from the CDS .data dict before generating output.

Bryan

On Feb 13, 2017, at 09:08, Bryan Van de ven [email protected] wrote:

Hi,

Bokeh has no way of knowing whether a column is truly “used” or no. E.g. maybe you refer to an “unused” column in some CustomJS callback, Bokeh has no way of determining this, so has to act conservatively. Accordingly, anything that goes into a CDS is “used”, in the sense that it is serialized and included in the document. The corollary is that every type in a CDS has to be serializable. I’m guessing Bokeh does not know what to do with NAT types. There’s certainly no corresponding type on the JS side. Maybe they can just be converted to nulls, but it will require discussion and/or investigation, so I’d suggest a GitHub issue.

Thanks,

Bryan

On Feb 13, 2017, at 08:19, John Marino [email protected] wrote:

I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?

Here is a simple example. Attached is the jupyter notebook I used.

import pandas as pd

pd.options.display.max_rows = 10

from bokeh.plotting import figure, output_notebook, output_file, show

output_file(‘broken.html’)

some fake data

data = pd.DataFrame({‘time’:[‘09:30’,‘10:00’,‘10:05’,‘10:30’], ‘prc’:[5,5.25,6,7]})

data.time = pd.to_datetime(data.time)

data[‘time_not_used’] = data.time # causes no problems

create an column we will not use, but set a value to NaT

data[‘time_not_used’] = data.time.where(data.time!=‘2017-02-13 10:05:00’)

print(data.info())

print(data)

p = figure(x_axis_type=‘datetime’)

p.circle(x=‘time’, y=‘prc’, source=data, color=‘red’, size=10)

show(p) # throws “ValueError: NaTType does not support timetuple”


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/f462c374-0fae-45e2-adc6-616e34a2eb1e%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

<ErrorExample.ipynb>