I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?
Here is a simple example. Attached is the jupyter notebook I used.
import pandas as pd
pd.options.display.max_rows = 10
from bokeh.plotting import figure, output_notebook, output_file, show
output_file(‘broken.html’)
some fake data
data = pd.DataFrame({‘time’:[‘09:30’,‘10:00’,‘10:05’,‘10:30’], ‘prc’:[5,5.25,6,7]})
data.time = pd.to_datetime(data.time)
data[‘time_not_used’] = data.time # causes no problems
create an column we will not use, but set a value to NaT
Bokeh has no way of knowing whether a column is truly "used" or no. E.g. maybe you refer to an "unused" column in some CustomJS callback, Bokeh has no way of determining this, so has to act conservatively. Accordingly, anything that goes into a CDS is "used", in the sense that it is serialized and included in the document. The corollary is that every type in a CDS has to be serializable. I'm guessing Bokeh does not know what to do with NAT types. There's certainly no corresponding type on the JS side. Maybe they can just be converted to nulls, but it will require discussion and/or investigation, so I'd suggest a GitHub issue.
I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?
Here is a simple example. Attached is the jupyter notebook I used.
import pandas as pd
pd.options.display.max_rows = 10
from bokeh.plotting import figure, output_notebook, output_file, show
output_file('broken.html')
# some fake data
data = pd.DataFrame({'time':['09:30','10:00','10:05','10:30'], 'prc':[5,5.25,6,7]})
data.time = pd.to_datetime(data.time)
# data['time_not_used'] = data.time # causes no problems
# create an column we will not use, but set a value to NaT
data['time_not_used'] = data.time.where(data.time!='2017-02-13 10:05:00')
print(data.info())
print(data)
p = figure(x_axis_type='datetime')
p.circle(x='time', y='prc', source=data, color='red', size=10)
show(p) # throws "ValueError: NaTType does not support timetuple"
I should add, as an immediate workaround, things should work if you delete such unused columns from the CDS .data dict before generating output.
Bryan
···
On Feb 13, 2017, at 09:08, Bryan Van de ven <[email protected]> wrote:
Hi,
Bokeh has no way of knowing whether a column is truly "used" or no. E.g. maybe you refer to an "unused" column in some CustomJS callback, Bokeh has no way of determining this, so has to act conservatively. Accordingly, anything that goes into a CDS is "used", in the sense that it is serialized and included in the document. The corollary is that every type in a CDS has to be serializable. I'm guessing Bokeh does not know what to do with NAT types. There's certainly no corresponding type on the JS side. Maybe they can just be converted to nulls, but it will require discussion and/or investigation, so I'd suggest a GitHub issue.
I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?
Here is a simple example. Attached is the jupyter notebook I used.
import pandas as pd
pd.options.display.max_rows = 10
from bokeh.plotting import figure, output_notebook, output_file, show
output_file('broken.html')
# some fake data
data = pd.DataFrame({'time':['09:30','10:00','10:05','10:30'], 'prc':[5,5.25,6,7]})
data.time = pd.to_datetime(data.time)
# data['time_not_used'] = data.time # causes no problems
# create an column we will not use, but set a value to NaT
data['time_not_used'] = data.time.where(data.time!='2017-02-13 10:05:00')
print(data.info())
print(data)
p = figure(x_axis_type='datetime')
p.circle(x='time', y='prc', source=data, color='red', size=10)
show(p) # throws "ValueError: NaTType does not support timetuple"
The other workaround is to avoid using the “source” param and only pass in what you need.
p.circle(x=data.time, y=data.prc, color=‘red’, size=10)
``
– John
···
On Monday, February 13, 2017 at 10:10:09 AM UTC-5, Bryan Van de ven wrote:
I should add, as an immediate workaround, things should work if you delete such unused columns from the CDS .data dict before generating output.
Bryan
On Feb 13, 2017, at 09:08, Bryan Van de ven [email protected] wrote:
Hi,
Bokeh has no way of knowing whether a column is truly “used” or no. E.g. maybe you refer to an “unused” column in some CustomJS callback, Bokeh has no way of determining this, so has to act conservatively. Accordingly, anything that goes into a CDS is “used”, in the sense that it is serialized and included in the document. The corollary is that every type in a CDS has to be serializable. I’m guessing Bokeh does not know what to do with NAT types. There’s certainly no corresponding type on the JS side. Maybe they can just be converted to nulls, but it will require discussion and/or investigation, so I’d suggest a GitHub issue.
I was surprised to find that having a dataframe with a datetime column that contains a NULL, even if that column is not used, causes an error. (ValueError: NaTType does not support timetuple) Is this expected behavior?
Here is a simple example. Attached is the jupyter notebook I used.
import pandas as pd
pd.options.display.max_rows = 10
from bokeh.plotting import figure, output_notebook, output_file, show
output_file(‘broken.html’)
some fake data
data = pd.DataFrame({‘time’:[‘09:30’,‘10:00’,‘10:05’,‘10:30’], ‘prc’:[5,5.25,6,7]})
data.time = pd.to_datetime(data.time)
data[‘time_not_used’] = data.time # causes no problems
create an column we will not use, but set a value to NaT