How can I truncate trailing zeros in decimal numbers displayed when using scientific notation on axis labels?

joelostblom · May 30, 2020, 8:11pm

In the example below, there are trailing zeros which don’t add any extra info, so it would be nice to be able to clean up the labels by control how many decimals are shown without turning of scientific formatting completely by using the bokeh.models.NumeralTickFormatter.

from bokeh.plotting import figure, show

p = figure()
p.circle((100_000, 10_000))
show(p)

I want the labels to be 2e+4 instead of 2.000e+4 and so on. I tried the bokeh.models.ScientifcFormatter mentioned in this PR, can’t figure out how to use it for ticks, but it seems like it is only available for table cells. The error I get when using it directly is ValueError: expected an instance of type TickFormatter, got ScientificFormatter(id='296049', ...) of type ScientificFormatter and it seems like I can’t pass it to NumeralTickFormatter either.

I realize that I can use PrintfTickFormatter(format="%4.1e") coupled with an if statement to check if the values in the column I am plotting are large enough to warrant scientific notation, but I am not sure if there are edge cases (e.g. when I need something like 2.1e+4 instead of just 2e+4, so it would be nice if I could just specify to truncate trailing zeros, is this possible?

p-himik · May 31, 2020, 3:54am

The default formatter class is BasicTickFormatter. It has the precision property that allows you to control the number of digits after the decimal dot. Just set the property to 0.
You won’t have any issues with 2.1e+4 unless you change the default ticker, an instance of BasicTicker - it generates ticks at regular predefined intervals. If a value is larger than 1, it will always be an integer.
You will have problems with values less than 1 - with precision = 0, they will become just 0.
If for some reason you find PrintfTickFormatter unsuitable, you can always use FuncTickFormatter and get complete control over the formatting process.

joelostblom · May 31, 2020, 4:22am

Thanks for the reply @p-himik. Sometimes the y-axis will have values between zeros and 1 so even if I use BasicTickFormatter with precision=0, I will need an if statement to check the max value in the column and only apply the tickformatter if the values are high enough for scientific notation to be used.

Btw, would it be reasonable to open an issue suggesting that the scientific notation by default truncates trailing zeros? Or is there a purpose to writing 2.000e+4 instead of 2e+4 that I am missing?

p-himik · May 31, 2020, 4:36am

There is a purpose to it. At least, that’s my understanding. 2.000e+4 means that there are 0s in those spaces. 2e+4 means that the number might have been rounded/truncated.

You cannot have conditional formatters on Python side. If you want to have some additional logic, you will have to use FuncTickFormatter or create your own formatter. With that being said, it’s nothing complicated. Likely, you will end up with just two lines of JavaScript.

Bryan · May 31, 2020, 4:42am

What about NumeralTickFormatter?

joelostblom · May 31, 2020, 5:58am

How can I truncate trailing zeros with NumeralTickFormatter? Or adjust precision for only scientific formatting? I didn’t see that when I went through the docs but I might just have missed it.

Bryan · May 31, 2020, 9:11pm

No it’s not helpful, there is a mode that will truncate trailing zeros, which I was focused on, but it doesn’t do scientific notation.

I think what you are after is the "%g" format from C standard printf. Unfortunately the JS “printf” libary (that PrintfTickFormatter is just a thin wrapper around) does not provide this mode. In the immediate term your best bet may be FuncTickFormatter

joelostblom · June 1, 2020, 6:43am

Thanks Bryan,
I will look into FuncTickFormatter in the future (since it seems like it requires JS). For now, I went with an if-statement checking the max value in the column and then the PrintfTickFormatter. I think it would be nice if Bokeh truncated trailing zeroes by default in scientific notation, and can open an issue for that (or a scientific tick formatter) if you think either of these is desirable.

_jm · June 1, 2020, 11:41am

@joelostblom

I would respectfully make the counterpoint that trailing zeroes should not be removed by default. Trailing zeroes can, in general, be significant digits that convey precision of the data, and that can be important to retain for some consumers of the data.

joelostblom · June 5, 2020, 5:45am

Sorry for the late reply @_jm, I agree with you and @p-himik that trailing zeroes are good for indicating precision for data. However, I don’t think this applies to the same extent for axis ticks as these can be placed with arbitrary precision along the axis. So even if a tick label says 2e+4, it can safely be assumed that any plotting library places this tick mark as close as possible to 2.000...e+4. This is also how bokeh treats log axes, which are written as 10^3 etc, not 10.000^3 or 1.000e+3.

An advantage of scientific notation is briefer labels for large numbers, but including trailing zeroes limits this benefit. If the trailing zeros are important, one could instead suppress scientific notation and write out the full number. FWIW, this is also how scientific notation is described on Wikipedia, only significant digits are kept. And as far as I can tell, this is how many other popular libraries treat tick labels, e.g. matplotlib.

If changing the default is out of question, I think it would be nice with an easy way to only keep significant digits in scientific notation. Happy to open an issue to track either of these suggestions if you agree.

p-himik · June 5, 2020, 7:22am

Writing out numbers like 2.000e-300 is not great.

As I mentioned before, just use FuncTickFormatter:

p.xaxis.formatter = FuncTickFormatter(code="return tick.toExponential();")

That’s it, there’s no more code required to achieve this.

_jm · June 5, 2020, 10:24am

@joelostblom

My point about removing trailing digits from the axis label is that the bokeh formatters include a precision argument, which allows the meaningful digits to be retained. If the trailing zeroes are automatically removed, it contradicts the intended behavior of that argument and can confuse a user. (It would be counterintuitive to me at least.)

I appreciate the need to deviate and do things in a custom way for both technical and aesthetic reasons, and your point about having an easy way to do this is also valid in my view.

I believe the capability exists within bokeh to do this and it is straightforward using the solution provided by @p-himik above, viz.

p-himik:

As I mentioned before, just use FuncTickFormatter :

p.xaxis.formatter = FuncTickFormatter(code=“return tick.toExponential();”)
That’s it, there’s no more code required to achieve this.

So, I would argue that the issue is not one of providing a new feature but documentation so that other users with similar requirements can implement in their software. Accepting the solution from @p-himik is probably sufficient.

Alternatively, if you have an example of default behaviors in other scientific plotting tools like MATLAB or matplotlib where bokeh behaves differently, it might compel more discussion about how the behavior can be integrated to bokeh to make it more friendly to new users.

joelostblom · June 5, 2020, 3:33pm

Thanks both for replying! And thanks for the FuncTickFormatter tip! It is better than the PrintfTickFormatter I was using since it only shows significant digits instead of a fixed number of decimals. It still has the limitation that I need an if-statement to check if the numbers on the axes are large enough to apply the formatter, whereas the suggested formatter specifically for scientific notation would be applied automatically whenever scientific notation is used.

To be clear, I don’t want to push for this feature if it is unwanted, and for my purposes it works good enough with an if-statement and manually changing the labels. But I do think the current behavior is unexpected (might just be me) and unusual based on what I have seen elsewhere, so I’ll try to explain more why I think that way and give some examples from other libraries below.

Writing out numbers like 2.000e-300 is not great

Agree, but with smaller negative exponents the current Bokeh scientific formatting would actually be incorrect since it introduces precision that might not exist (e.g. it would make 0.2 appear as 2.000e-1, which is actually 0.200). This seems to not be an issue in practice since bokeh avoids applying scientific formatting to numbers in this range.

For large numbers in general, trailing zeros are not deemed that important with the current truncation either since you suppress 97 of them with typing 2.000e+100. To me, the logic behind writing 2e+100 instead, is the same as to why we write 2 instead of 2.000 (even if we had the original number at this higher precision). Those decimals are only important if the data varies over a very small range so that the next axis tick would be 2.125 or similar. This same logic is already applied in bokeh when formatting decimal numbers, and trailing zeros are truncated here.

Alternatively, if you have an example of default behaviors in other scientific plotting tools like MATLAB or matplotlib where bokeh behaves differently, it might compel more discussion about how the behavior can be integrated to bokeh to make it more friendly to new users.

You made me curious so I went looking into more libraries =) I tried to find the default behavior, but as I am not proficient in all these language, I might have misunderstood something. In any case, this is what I found (spoiler: none of the libraries I looked at include trailing zeros):

Matplotlib shows only significant digits follows by x 10^e (same example as linked above)

Mathematica is the same as matplotlib

ggplot is similar, just using different notation for the exponent part

Matlab keeps only significant digits and includes the exponent at the top of the graph.

Plotly keeps only significant digits and abbreviates by letter suffix instead of exponent.

I believe D3 does the same as plotly (but I am not knowledgeable enough to say if that’s the default or if it is explicitly set somewhere in this example)

While vega/vega-lite/altair seems to alsways show the raw numbers, but their data exploratory tool polestar has the same behavior as plotly and d3.

Sorry that this post became so long.

_jm · June 5, 2020, 3:51pm

@joelostblom

No apologies necessary; constructive discussion is good in my view.

Your point about how things are done in this example is well taken.

I personally find this behavior annoying in all packages (including MATLAB and matplotlib that is designed to be MATLAB-like in syntax and behavior). And I often end up customizing the tick labels so that the labels are consistent, e.g. 0.010 , 0.020 in your example. So, this is the opposite of what you’re trying to achieve in some respects.

I am just a user stating my preferences so I certainly don’t have a strong opinion about how things should work by default. And I can work within the choices provided by the designers in cases such as this.

Just pointing out when I see an argument like precision, I’d expect it to make things uniform and consistent when used. If that doesn’t hold in examples you see, I’d be interested to know if that’s by design or how it can be addressed if it is unintended behavior.

Bryan · June 5, 2020, 4:06pm

FWIW if someone wants to add a flag to some formatter to truncate trailing zeros, I’d be fine to merge it (it just needs to be off by default to preserve compatibility).

joelostblom · June 9, 2020, 4:32pm

Just a note for anyone stopping by and looking to to format small numbers as exponentials instead of big ones. Using just FuncTickFormatter(code=“return tick.toExponential();”) will lead to issues with floating point precision:

Instead, you need to set the precision of the tick numbers before converting them to exponential notation using FuncTickFormatter(code="return parseFloat(tick.toPrecision(12)).toExponential();"):

This is due to how javascript handles floats. The parseFloat is needed because toPrecision returns a string.