C ++标准是否允许这种浮点行为？

Question

In the following code: 在下面的代码中：

#include <cstdint>
#include <cinttypes>
#include <cstdio>

using namespace std;

int main() {
    double xd = 1.18;
    int64_t xi = 1000000000;

    int64_t res1 = (double)(xi * xd);

    double d = xi * xd;
    int64_t res2 = d;

    printf("%" PRId64"\n", res1);
    printf("%" PRId64"\n", res2);
}

Using v4.9.3 g++ -std=c++14 targeting 32-bit Windows I get output: 使用v4.9.3 g++ -std=c++14定位到32位Windows，我得到了输出：

1179999999
1180000000

Are these values allowed to be different? 这些值可以不同吗？

I expected that, even if the compiler uses a higher internal precision than double for the computation of xi * xd , it should do this consistently. 我希望，即使编译器在计算xi * xd使用比double更高的内部精度，也应该一致地执行此操作。 Loss of precising in floating conversion is implementation-defined , and also the precision of this calculation is implementation-defined - [c.limits]/3 says that FLT_EVAL_METHOD should be imported from C99. 浮点转换中的精确丢失是实现定义的 ，并且此计算的精度也是实现定义的 -[c.limits] / 3表示应从C99导入FLT_EVAL_METHOD 。 IOW I expected that it should not be allowed to use a different precision for xi * xd on one line than it does on another line. IOW，我希望不应在一行上为xi * xd与另一行上不同的精度。

Note: This is intentionally a C++ question and not a C question - I believe the two languages have different rules in this area. 注意：这故意是C ++问题，而不是C问题-我相信这两种语言在此领域有不同的规则。

Answer 1

even if the compiler uses a higher internal precision than double for the computation of xi * xd, it should do this consistently 即使编译器在计算xi * xd时使用比double更高的内部精度，也应该一致地执行此操作

Whether required or not (discussed below), this clearly doesn't happen: Stackoverflow is littered with questions from people who've seen similar-seeming calculations change for no ostensible reason within the same program . 无论是否需要（在下面讨论），这显然都不会发生：Stackoverflow上堆满了人们的提问，这些人看到在相同程序中没有类似表象的计算发生了明显的变化。

The C++ Standard draft n3690 says (emphasis mine): C ++标准草案n3690说（强调我）：

The values of the floating operands and the results of floating expressions may be represented in greater precision and range than that required by the type; 浮点操作数的值和浮点表达式的结果可以比类型所需的精度和范围大。 the types are not changed thereby.62 类型不会因此改变62。

62) The cast and assignment operators must still perform their specific conversions as described in 5.4, 5.2.9 and 5.17. 62）强制转换和赋值运算符仍必须按照5.4、5.2.9和5.17所述执行其特定的转换。

So - in agreement with MM's comment and contrary to my earlier edit - it's the version with the (double) cast that must be rounded to a 64-bit double - which evidently happens to be >= 1180000000 in the run documented in the question - before truncation to integer. 所以-与MM的评论，违反了我先前的编辑协议-这是与版本(double)演员，其中必须四舍五入到64位double -这显然恰好是> =在问题记录的运行11.8 -截断为整数之前。 The more general case sans 62) leaves the compiler freedom not to round early in the other case. 更一般的情况（不带62）使编译器可以自由选择，而不必在其他情况下尽早取整。

[c.limits]/3 says that FLT_EVAL_METHOD should be imported from C99. [c.limits] / 3说应该从C99导入FLT_EVAL_METHOD。 IOW I expected that it should not be allowed to use a different precision for xi * xd on one line than it does on another line. IOW，我希望不应在一行上为xi * xd使用与另一行上不同的精度。

Check the cppreference page : 检查cppreference页面：

Regardless of the value of FLT_EVAL_METHOD, any floating-point expression may be contracted, that is, calculated as if all intermediate results have infinite range and precision (unless #pragma STDC FP_CONTRACT is off) 不管FLT_EVAL_METHOD的值如何，任何浮点表达式都可以收缩，即，如同所有中间结果都具有无限范围和精度一样进行计算（除非#pragma STDC FP_CONTRACT关闭）

As tmyklebu comments, it continues: 正如tmyklebu所说，它继续：

Cast and assignment strip away any extraneous range and precision: this models the action of storing a value from an extended-precision FPU register into a standard-sized memory location. 强制转换和赋值会删除任何无关紧要的范围和精度：这可以模拟将扩展精度FPU寄存器中的值存储到标准大小的存储位置中的操作。

This last agrees with the "62)" part of the Standard. 最后一点与标准的“ 62）”部分一致。

MM comments: MM评论：

STDC FP_CONTRACT does not seem to appear in the C++ Standard and also it's not clear to me exactly to what extent the C99 behaviour is 'imported' STDC FP_CONTRACT似乎没有出现在C ++标准中，而且我不清楚C99行为在何种程度上被“导入”

Doesn't appear in the draft I looked at. 没有出现在我查看的草稿中。 That suggests C++ doesn't guarantee its availability, leaving the default mentioned above of "any floating-point expression may be contracted" , but we know per MM comments and the Standard and cppreference quotes above the (double) cast is an exception forcing rounding to 64 bits. 这表明C ++不保证其可用性，保留了上面提到的默认值“任何浮点表达式都可以收缩” ，但是我们知道每个MM注释以及(double)强制转换上方的Standard和cppreference引号是强制舍入的例外到64位。

The C++ Standard draft mentioned above says of <cfloat> : 上面提到的C ++标准草案提到<cfloat> ：

The contents are the same as the Standard C library header . 内容与标准C库标头相同。 See also: ISO C 7.1.5, 5.2.4.2.2, 5.2.4.2.1. 另请参阅：ISO C 7.1.5、5.2.4.2.2、5.2.4.2.1。

If one of those C Standards required STDC FP_CONTRACT there's more chance of it being portable for use by C++ programs, but I've not surveyed implementations for support. 如果这些C标准之一需要STDC FP_CONTRACT ，那么它更有可能被C ++程序移植使用，但是我没有调查实现的支持。

Answer 2

Depending on FLT_EVAL_METHOD, xi * xd may be calculated with higher precision than double. 取决于FLT_EVAL_METHOD，xi * xd的计算精度可能比double更高。 If xi were so large that it cannot be represented exactly in double, then I'm not even sure if the compiler would be allowed to convert it exactly to long double or not - probably not, because that conversion happens before anything covered by FLT_EVAL_METHOD. 如果xi太大而无法将其精确地表示为double，那么我什至不确定是否允许编译器将其精确转换为long double-可能不是，因为该转换发生在 FLT_EVAL_METHOD涵盖的所有内容之前。 There is no requirement that higher precision must be used consistently. 不要求必须始终使用更高的精度。

There are two places where conversion to double must happen: At the point of the cast (double) and at the point of assignment to a double. 必须在两个地方进行转换成双精度：在转换（双精度）点和分配给双精度点的位置。 There have been gcc versions where the cast to double was "optimised" away if a value was already "officially" a double (like xi * xd here) even if in reality it was higher precision; 有些gcc版本中，即使实际上已经是一个更高的精度了，如果一个值已经“正式地”是一个双精度值（例如xi * xd），则“优化”为双精度型转换。 that "optimisation" was always a bug because a cast must convert. “优化”始终是一个错误，因为强制转换必须转换。

So you may have run into this bug where a cast to double wasn't performed (if the bug is still there), you may have run into inconsistent use of higher precision, which is legal if FLT_EVAL_METHOD allows it, and you may even have run into inconsistent use of higher precision when FLT_EVAL_METHOD didn't allow it at all, which would again be a bug (not the inconsistency, but the use of higher precision in the first place). 因此，您可能会遇到此错误，而该错误未进行强制转换（如果该错误仍然存在），则可能会导致不一致地使用更高的精度，如果FLT_EVAL_METHOD允许，这是合法的，甚至可能当FLT_EVAL_METHOD根本不允许使用更高精度时，就会出现不一致的情况，这又是一个错误（不是不一致，而是首先使用更高精度）。

C ++标准是否允许这种浮点行为？

问题描述

2 个解决方案

解决方案1
3 2015-12-21 05:19:42

解决方案2
2 2015-12-21 08:56:25

C ++标准是否允许这种浮点行为？

问题描述

2 个解决方案

解决方案1 3 2015-12-21 05:19:42

解决方案2 2 2015-12-21 08:56:25

解决方案1
3 2015-12-21 05:19:42

解决方案2
2 2015-12-21 08:56:25