简体   繁体   中英

SpreadsheetML: How should consuming applications parse floating-point numbers?

I'm having some difficulty understanding Excel's handling of cell values which are not exactly representable in IEEE 754 floating point.

My motivating example can be achieved by saving a spreadsheet from Excel 2010 or 2013 in xlsx format with a single number in cell A1 of sheet 1.

Then, edit the underlying XML and replace that cell's value to look like this:

<v>62408.000000000007</v>

That number has 17 significant digits, and cannot be represented exactly in IEEE 754 floating point.

Parsing the string "62408.000000000007" as a double-precision floating point number in Java and Python gives 62408.00000000001, which has one fewer significant digit and can be represented exactly. Both of these programming languages claim to implement (a subset of) IEEE 754.

However, Excel 2010 and 2013, presented with that file, display 62408 in the UI (and no matter how many decimal places you specify in the number format, only zeroes appear after the decimal point). So Excel seems to parse that cell value as 62408 exactly.

Can anyone point me to a definitive reference for how applications should parse a floating point number from a SpreadsheetML (xlsx) file's v element inside a cell?

What would also be useful is a definitive reference on how Excel does it.

I have tried to examine the Office Open XML standard reference documents at http://www.ecma-international.org/publications/standards/Ecma-376.htm

However, beyond finding that the v element has type ST_Xstring in this context, I can't find anything about how to parse cell values, especially as numbers.

Can anyone point me to a definitive reference for how applications should parse a floating point number from a SpreadsheetML (xlsx) file's v element inside a cell.

I doubt there is one but I can share some of my experience from writing libraries for writing xls and xlsx files in four different programming languages.

Excel uses standard IEEE 754 floating point. When writing xlsx files it needs to encode those values as a string and any variations in digits beyond 15 are probably due to printf style formatting.

It may display 62408.000000000007 as 62408 but internally it is still handling it as an IEEE 754 double. This was more evident in the xls format where the value was saved like it was in memory as a 64bit IEEE 754 double.

So to answer the "how applications should parse a floating point number" part of your question; applications should parse them with whatever library it has available to convert a string representation of a double to an in-memory double. If your application is compiled with the same compiler as Excel then you will probably get exactly the same results via the same system library. If not you will likely get the same result anyway.

However, this does not guarantee that the number will display as an int when it is really a double. That is something that Excel the application is doing and is not related to the file format.

So Excel seems to parse that cell value as 62408 exactly.

I would think that the seems to part is exactly right and that what you are seeing is due to the presentation layer. I doubt that Excel does or could parse a value exactly if it cannot be represented exactly in the IEE754 format.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM