HoverTool long int accuracy loss

Hi,

I’ve been trying to display long integers in the hover tool, they are correct going into (dtype=np.int64) ColumnDataSource and reading out (I’ve done the source.to_df() and they remain correct). However, on the actual plot, they become wrong, it tends to be the last few digits but it’s like the accuracy has been lost during the formatting? I’ve tried in the tool tips both @source_id, @source_id{0} and @source_id{int}.
I can provide a worked example if required (I just need to sanitise the data set).
Thanks,
Will

@Will-Cooper have you tried the methods shows here?

Hi @Bryan,
Yeah I’ve tried the numeral formatting methods like @source_id{0} and the printf formatting methods like @source_id{%d} (and %i, %u, %s) but they all lose their accuracy?
I can get around the issue by type casting the long int into a string before creating the ColumnDataSource for now but it’s not an ideal solution.
I did some playing and found when an int has a length > 17, it always rounds to the 17th digit regardless of how it’s being formatted (bar type casting as a string before hand); test code attached for clarity (hopefully).

import numpy as np
from bokeh.plotting import figure, show
from bokeh.models import HoverTool, ColumnDataSource
source_id = np.int64(np.random.randn(100) * 12345 + 123456789876543210)  # 18 digit length ids
# some data
x = np.linspace(1, 100, 100)
y = 2 * x + np.random.randn(100) * 3
# create source for plotting
source = ColumnDataSource(dict(x=x, y=y, source_id=source_id))
print(source.to_df()[:10])  # print first 10 values (displays correctly the ids)
# plotting
p = figure(title='Hover Tool Test', tools='', x_axis_label='x', y_axis_label='y', sizing_mode='stretch_width')
p.circle(source=source, x='x', y='y')
p.add_tools(HoverTool(tooltips=[('source_id', '@source_id')], formatters={'source_id': 'numeral'}))
show(p)

Cheers,
Will

@Will-Cooper CDS columns that are Numpy arrays are serialized in a way that automatically generates JavaScript typed arrays on the browser side. Typed arrays offer much better performance and other benefits. Unfortunately, only Int32Array is available. There is no 64bit integer typed array available in browsers today that we could use even if we wanted to. Seems like a glaring omission and oversight to me, but I am not on the ECMA committee. :man_shrugging:

Some options:

  • Try a regular Python list as the CDS column, instead of a Numpy array. This will be serialized as plain JSON and may work (nans/infs will be problematic, and may affect performance depending on column size)
  • Format in python and use strings as you are doing now

@Bryan Thankyou for clarifying, I suppose long integers aren’t that common!
I tried the inbuilt python list just now and it has the same rounding error unfortunately…
Would it be possible to produce a warning perhaps? As it’s quite an easy effect to miss if you’re not triple checking your results! Something simple like:

if type(CDS column) == np.int64:
    warning_message

perhaps?
Thankyou again!

@Will-Cooper Thinking more, if you have an np.int64 column it should already just be shunted in to using plain JSON for the encoding, since Bokeh knows there is not 64bit int typed array type. In which case this is either some limitation of JS itself, or of the formatting libraries, since we just pass the value as-is to the formatters. I would say in this case formatting the strings yourself in Python is probably your best bet.

Did you try other formatters? (e.g “printf” listed in the docs link above)

@Will-Cooper Actually I think you are just exceeding the JS max integer value:

It is 2^{53} - 1 which is 9007199254740992 (16 digits)

Interactive web plotting is great, but JavaScript giveth, and JavaScript also taketh… :confused:

@Bryan Yes it looks like that’s the case, I suppose the only working solution then is like I do, type casting into a string before creating CDS.
Yeah I’ve tried all the different formatting structures in either numeral or printf; sorry I’m still very much new to the joys of JavaScript…
n.b. Could it be done in Bokeh, i.e. before sending the arrays to JavaScript, either type cast them as strings or raise a warning if the column is a long integer?

Could and Should are different questions, and regardless of technical consideration I would say that kind of automagic should not be done. For one, it would require the vast number of Bokeh users who don’t have to deal with long intergers to pay the price of expensive full-array scans on every CDS column, all the time. Also if the expectation is to be able to use the numbers as numbers in the browser, this would lead to all kinds of surprise.