简体   繁体   English

表中异常的浮点数

[英]Unusual floating point numbers in tables

I am building a system to read tables from heterogeneous documents and would like to know the best way of managing (columns of) floating point numbers. 我正在建立一个从异类文档中读取表的系统,并且想知道管理浮点数(的列)的最佳方法。 Where the column can be represented as real numbers I will use List<Double> (I'm using Java but experience from other languages would be useful.) I also wish to serialize the table as a CSV file. 在该列可以表示为实数的地方,我将使用List<Double> (我使用Java,但有其他语言的经验会很有用。)我还希望将表格序列化为CSV文件。 Thus a table might look like: 因此,表可能如下所示:

"material", "mass (g)", "volume (cm3)",
"iron", 7.8, 1.0,
"aluminium", 27.3, 9.9,

and column2 (1-based) would be represented by a List<Double> 和column2(从1开始)将由List<Double>

{new Double(7.8), new Double(27.3)} 

I may also wish to compute the density (mass/volume) and derive a new column ("density (g.cml-3)") as a List 我也可能希望计算密度(质量/体积)并派生一个新列(“ density(g.cml-3)”)作为列表

{new Double(7.8), new Double(2.76)} 

However the input values are sometimes missing, unusual or represented by fuzzy concepts. 但是,输入值有时会丢失,不正常或由模糊概念表示。 Some transformations may throw exceptions (which I would catch and replace by one of the above). 一些转换可能会引发异常(我将捕获并替换为上述异常之一)。 Examples include: 示例包括:

1.0E+10000
>10
10 / 0.0 (i.e. divide by zero)
Math.sqrt(-1.)
Math.tan(Math.PI/2.0)

I have the following options in Java for unusual values of a list element 在Java中有以下选项用于列表元素的异常值

  1. null reference 空引用
  2. Double.NaN
  3. Double.MAX_VALUE
  4. Double.POSITIVE_INFINITY

Are there protocols for when the Java unusual values above should be used? 是否有关于何时应使用上述Java异常值的协议? I have read this question on how they behave. 我已经阅读了有关其行为方式的问题。 (I would like to rely on chaining of their operations). (我想依靠他们的运营链)。 And if there are protocols can the values be serialized and read back in? 如果有协议,可以将值序列化并读回吗? (eg does Java parse "0x7ff0000000000000L" to a number equal to Double.POSITIVE_INFINITY (例如,Java是否将"0x7ff0000000000000L"解析为等于Double.POSITIVE_INFINITY的数字Double.POSITIVE_INFINITY

I am prepared for some loss of precision in specification (there are often errors in OCR, missing digits etc. so this is a "good enough" exercise). 我已经为规范的某些精度损失做好了准备(OCR中经常会出现错误,数字丢失等,因此这是“足够好”的练习)。

You have three problems that you ought to separate to some extent: 您有三个应该在某种程度上分开的问题:

  1. What representation should you use for table entries, which might be numbers, numbered quantities of some units, or other things? 您应该对表条目使用哪种表示形式,可能是数字,某些单位的编号数量或其他?

  2. How might floating-point infinities and NaNs serve you? 浮点无穷和NaN如何为您服务?

  3. How can floating-point objects be serialized (written to a file and read from a file)? 如何将浮点对象序列化(写入文件并从文件中读取)?

Regarding these: 关于这些:

  1. You have not specified enough information here for good advice about how to represent table entries. 您在此处没有指定足够的信息以获取有关如何表示表条目的良好建议。 From what you describe, there is no reason to use floating point at all. 根据您的描述,根本没有理由使用浮点数。 This is because you have not specified what operations you want to perform on the entries other than reading and writing them. 这是因为您没有指定要对条目执行的操作(读写操作除外)。 If you do not need to do arithmetic, there is no reason to bother converting values to floating point, or to any other number-arithmetic system. 如果不需要进行算术运算,则无需费心将值转换为浮点数或任何其他数字算术系统。 You could simply maintain the entries as their original text. 您可以简单地将条目保留为其原始文本。 This makes serialization trivial. 这使得序列化变得微不足道。

  2. Floating-point infinities act like mathematical infinity, by design. 通过设计,浮点无穷大就像数学无穷大。 Infinity plus a number other than infinity remains infinity, et cetera. 无穷大加上无穷大以外的数字仍然是无穷大,等等。 You should use floating-point infinities to represent mathematical infinities. 您应该使用浮点无穷表示数学无穷。 You should avoid using floating-point infinities to represent overflows, unless you do not care about losing the values that overflow. 您应该避免使用浮点无穷表示溢出,除非您不关心丢失溢出值。 Floating-point NaNs are intended to represent “not a number”. 浮点NaN旨在表示“不是数字”。 It is often used to represent something like “An error occurred, so we do not have a number here to give you. 它通常用于表示类似“发生错误,因此我们这里没有数字可以给您。 You should do something else in this place.” Then it is up to the application to supply the something else, perhaps by having supplementary information from another source or in a parallel data structure. 然后,应由应用程序提供其他内容,这可能是由应用程序提供的,可能是来自其他来源或并行数据结构中的补充信息。 Errors include things such as taking the square root of a negative number or failing to initialize some data. 错误包括诸如使用负数的平方根或无法初始化某些数据之类的事情。 (Eg, some underlying software initializes floating-point data to NaNs, so that, if you do not initialize it yourself, NaNs remain.) You should generally treat NaNs as “empty places” that you must not use rather than as tokens representing something. (例如,某些基础软件将浮点数据初始化为NaN,因此,如果您不自己初始化它,NaN仍会保留。)通常应将NaN视为您不能使用的“空位置”,而不是作为表示某些内容的令牌。

  3. When writing and reading floating-point values, you should take care to convert the values exactly or ensure that the errors you introduce in conversion are tolerable. 在写入和读取浮点值时,应注意准确地转换值,或确保在转换中引入的错误是可以容忍的。 If you must convert to text (human-readable numerals) rather than writing in “binary” (bytes with arbitrary values), then it may be preferable to write in a notation that uses a numeric base compatible with the native radix of the floating-point system (eg, hexadecimal floating-point numerals for binary floating-point representations, such as 0x3.4p-2 for .8125). 如果您必须转换为文本(人类可读的数字),而不是写成“二进制”(具有任意值的字节),那么最好使用一种与浮点数的本机基数兼容的数字基数来书写点系统(例如,用于二进制浮点表示的十六进制浮点数字,例如.8125的0x3.4p-2)。 If this is not feasible, then you need to produce enough digits (when converting to decimal) to represent the floating-point value accurately enough to recover the original value when reading it, and you need to ensure the conversion software converts without introducing additional errors. 如果这不可行,则需要产生足够的数字(转换为十进制时)以足够准确地表示浮点值,以在读取原始值时恢复原始值,并且需要确保转换软件进行转换时不会引入其他错误。 You must also handle special values such as infinities and NaNs. 您还必须处理特殊值,例如infinities和NaN。

(Note that Math.tan(Math.PI/2) is not infinity and does not cause an exception because Math.PI/2 is not exactly π/2, so its tangent is finite, not infinity.) (请注意, Math.tan(Math.PI/2)不是无穷大,并且不会引起异常,因为Math.PI/2并不完全是π/ 2,因此其切线是有限的,而不是无穷大。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM