简体繁体 English

解析（浮点）数字时使用什么信息？

[英]What information is used when parsing a (float) number?

原文 2014-03-24 19:27:41 1 1 c++/ c/ locale/ iostream/ scanf

What information does the Standard library of C++ use when parsing a (float) number? 解析（浮点）数字时，C ++标准库使用哪些信息？

Here's the possibilities I know to parse a (single) float number with std c++: 这是我知道的用std c ++解析（单个）浮点数的可能性：

double atof( const char *str )
sscanf
double strtod( const char* str, char** str_end );
istringstream , via operator>> or istringstream ，通过operator>>或
via num_get directly 直接通过num_get

It seems obvious, that at the very least , we have to know what character is used as decimal separator. 看来，至少，我们必须知道什么字符用作小数点分隔符。

iostreams, in particular num_get::get , in addition also talk about: iostreams，尤其是num_get::get ，此外还讨论：

ios_base I/O format flags - Is there any information here that is used when parsing floating point ? ios_base I / O格式标志- 解析浮点时，这里是否使用任何信息？
the thousands_sep arator (* see below ) 在thousands_sep arator（ 见下面 ）

On the other hand, in std::strtod , which seems to be what sscanf is defined in terms of (which in turn is referenced by num_get ), there the only variable information seems to be what is considered a space and the decimal character, although it doesn't seem to be specified where that is defined. 另一方面，在std::strtod ，这似乎是sscanf所定义的（依次由num_get引用），那里唯一的变量信息似乎是被认为是空格和十进制字符的信息，尽管似乎没有在定义位置指定它。 (At least neither on cppref nor on MSDN.) （至少在cppref或MSDN上都没有。）

So, what information is actually used, and what comprises a valid parseable float representation for the C++ Standard lib? 那么，实际上使用了哪些信息，什么构成了C ++ Standard库的有效可解析浮点表示形式？

From what I see, only the decimal separator from the global ( C or C++ ???) is needed and, in addition, if the number contains a thousands separator, I would expect it to only be parsed correctly by num_get since strod / sscanf do not support the thousands separator. 从我所看到的，仅需要全局（ C或C++ ???）中的小数点分隔符，此外，如果数字包含千位分隔符，我希望它只能由num_get正确解析，因为strod / sscanf不支持千位分隔符。

(*) The group (thousands) separator is an interesting case to me. （*）组（千位）分隔符对我来说是一个有趣的案例。 As far as I can tell the " C " functions do not make any reference to it and last time I checked C and C++ standard printf function will never write it. 据我所知，“ C ”函数没有对其进行任何引用，并且上次我检查C和C++标准printf函数将永远不会编写它。 So is it really processed by the strtod / scanf functions? 那么它真的是由strtod / scanf函数处理的吗？ (I know that there is a POSIX printf extension for the group separator, but that's not really standard, and notably missing from Microsoft's implementation.) （我知道组分隔符有一个POSIX printf扩展名，但这并不是真正的标准，尤其是在Microsoft的实现中缺少。）

1 个解决方案

The C11 spec for strtod() seems to have a opening big enough for any size truck to drive through. C11的strtod()规范似乎有足够大的开口，可让任何大小的卡车驶过。 It appears so open ended, I see no limitation. 它似乎是开放式的，我认为没有限制。

§7.22.1.3 6 In other than the "C" locale, additional locale-specific subject sequence forms may be accepted. §7.22.1.36在“ C”语言环境之外，还可以接受其他特定于语言环境的主题序列形式。

For non- "standard C" locales, the isspace() , decimal (radix) point, group separator, digits per group and sign seem to constitute the typical variants. 对于非“标准C”语言环境， isspace() ，十进制（基数）点，组分隔符，每组位数和符号似乎构成了典型的变体。 But apparently there is no limit. 但是显然没有限制。

For fun experimented with 500+ locales using printf() , sscanf() , strftime() and isspace() . 使用printf() ， sscanf() ， strftime()和isspace()在500多个语言环境中进行了有趣的实验。

All tested locales had a radix (decimal) point of '.' 所有测试的语言环境的基数（小数点）均为'.' or ',' , the same +/- sign, no digit grouping, and the expected 0-9. 或',' ，相同的+/-号，无数字分组以及预期的0-9。

strftime(... "%Y" ...) did not use a digit separator over years 1000-99999. 在1000-99999年内， strftime(... "%Y" ...)未使用数字分隔符。

sscanf("1,234.5", "%lf", .. and sscanf("1.234,5", "%lf", .. did not produce 1234.5 in any locale. sscanf("1,234.5", "%lf", ..和sscanf("1.234,5", "%lf", ..在任何语言环境中均不会产生1234.5。

All int values in the range 0 to 255 produced the same isspace() results with the sometimes exception of 154 and 160. 0到255范围内的所有int值都产生相同的isspace()结果，有时是154和160例外。

Of course these test do not prove a limit to what may occur, but do represent a sample of possibilities. 当然，这些测试并不能证明可能会发生什么，但确实代表了一种可能性。