简体   繁体   English

是否假定C / C ++中的所有函数都返回?

[英]Are all functions in C/C++ assumed to return?

I was reading this paper on undefined behaviour and one of the example "optimisations" looks highly dubious: 我正在阅读关于未定义行为的本文 ,其中一个示例“优化”看起来非常可疑:

 if (arg2 == 0) ereport(ERROR, (errcode(ERRCODE_DIVISION_BY_ZERO), errmsg("division by zero"))); /* No overflow is possible */ PG_RETURN_INT32((int32) arg1 / arg2); 

Figure 2 : An unexpected optimization voids the division-by-zero check, in src/backend/utils/adt/int8.c of PostgreSQL. 图2 :意外的优化使PostgreSQL的src/backend/utils/adt/int8.c中的除零检查src/backend/utils/adt/int8.c The call to ereport(ERROR, :::) will raise an exception. ereport(ERROR, :::)的调用将引发异常。

Essentially, the compiler assumes that ereport will return, and removes the arg2 == 0 check since the presence of the division implies a non-zero denominator, ie arg2 != 0 . 本质上,编译器假定 ereport将返回,并删除arg2 == 0检查,因为除法的存在意味着非零分母,即arg2 != 0

Is this a valid optimisation? 这是一个有效的优化吗? Is the compiler free to assume that a function will always return? 编译器是否可以自由地假设函数将始终返回?

EDIT: The whole thing depends on ereport , which is described thus: 编辑:整个事情取决于ereport ,它是如此描述:

   84 /*----------
   85  * New-style error reporting API: to be used in this way:
   86  *      ereport(ERROR,
   87  *              (errcode(ERRCODE_UNDEFINED_CURSOR),
   88  *               errmsg("portal \"%s\" not found", stmt->portalname),
   89  *               ... other errxxx() fields as needed ...));
   90  *
   91  * The error level is required, and so is a primary error message (errmsg
   92  * or errmsg_internal).  All else is optional.  errcode() defaults to
   93  * ERRCODE_INTERNAL_ERROR if elevel is ERROR or more, ERRCODE_WARNING
   94  * if elevel is WARNING, or ERRCODE_SUCCESSFUL_COMPLETION if elevel is
   95  * NOTICE or below.
   96  *
   97  * ereport_domain() allows a message domain to be specified, for modules that
   98  * wish to use a different message catalog from the backend's.  To avoid having
   99  * one copy of the default text domain per .o file, we define it as NULL here
  100  * and have errstart insert the default text domain.  Modules can either use
  101  * ereport_domain() directly, or preferably they can override the TEXTDOMAIN
  102  * macro.
  103  *
  104  * If elevel >= ERROR, the call will not return; we try to inform the compiler
  105  * of that via pg_unreachable().  However, no useful optimization effect is
  106  * obtained unless the compiler sees elevel as a compile-time constant, else
  107  * we're just adding code bloat.  So, if __builtin_constant_p is available,
  108  * use that to cause the second if() to vanish completely for non-constant
  109  * cases.  We avoid using a local variable because it's not necessary and
  110  * prevents gcc from making the unreachability deduction at optlevel -O0.
  111  *----------

Is the compiler free to assume that a function will always return? 编译器是否可以自由地假设函数将始终返回?

It is not legal in C or C++ for a compiler to optimize on that basis, unless it somehow specifically knows that ereport returns (for example by inlining it and inspecting the code). 在C或C ++中,编译器在此基础上进行优化是不合法的,除非它以某种方式明确知道ereport返回(例如通过内联并检查代码)。

ereport depends on at least one #define and on the values passed in, so I can't be sure, but it certainly looks to be designed to conditionally not return (and it calls an extern function errstart that, as far as the compiler knows, may or may not return). ereport依赖于至少一个#define和传入的值,所以我不能确定,但​​它肯定看起来设计为有条件地不返回(并且它调用extern函数errstart ,就编译器而言,可能会或可能不会返回)。 So if the compiler really is assuming that it always returns then either the compiler is wrong, or the implementation of ereport is wrong, or I've completely misunderstood it. 因此,如果编译器确实假设它总是返回,那么编译器是错误的,或者ereport的实现是错误的,或者我完全误解了它。

The paper says, 论文说,

However, the programmer failed to inform the compiler that the call to ereport(ERROR, ::: ) does not return. 但是,程序员无法通知编译器对ereport(ERROR,:::)的调用没有返回。

I don't believe that the programmer has any such obligation, unless perhaps there's some non-standard extension in effect when compiling this particular code, that enables an optimization that's documented to break valid code under certain conditions. 我不相信程序员有任何这样的义务,除非在编译这个特定代码时可能存在一些非标准的扩展,这使得在某些条件下记录的优化能够破坏有效代码。

Unfortunately it is rather difficult to prove the code transformation is incorrect by citing the standard, since I can't quote anything to show that there isn't, tucked away somewhere in pages 700-900, a little clause that says "oh, by the way, all functions must return". 不幸的是,通过引用标准来证明代码转换是不正确的是相当困难的,因为我无法引用任何东西来表明没有,隐藏在700-900页的某个地方,一个小句子说“哦,通过顺便说一句,所有功能必须返回“。 I haven't actually read every line of the standard, but such a clause would be absurd: functions need to be allowed to call abort() or exit() or longjmp() . 我实际上并没有阅读标准的每一行,但这样的条款是荒谬的:需要允许函数调用abort()exit()longjmp() In C++ they can also throw exceptions. 在C ++中,它们也可以抛出异常。 And they need to be allowed to do this conditionally -- the attribute noreturn means that the function never returns, not that it might not return, and its absence proves nothing about whether the function returns or not. 并且它们需要被允许有条件地执行此操作 - 属性noreturn意味着函数永远不会返回,而不是它可能不会返回,并且它的缺失证明函数是否返回。 My experience of both standards is that they aren't (that) absurd. 我对这两个标准的体验是,它们不是荒谬的。

Optimizations are not allowed to break valid programs, they're constrained by the "as-if" rule that observable behaviour is preserved. 优化不允许破坏有效的程序,它们受到保留可观察行为的“as-if”规则的约束。 If ereport doesn't return then the "optimization" changes the observable behaviour of the program (from doing whatever ereport does instead of returning, to having undefined behaviour due to the division by zero). 如果ereport没有返回,那么“优化”会改变程序的可观察行为(从做任何ereport而不是返回,到由于除零而导致的未定义行为)。 Hence it is forbidden. 因此被禁止。

There's more information on this particular issue here: 这里有关于这个特定问题的更多信息:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616180 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616180

It mentions a GCC bug report http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29968 that was (rightly IMO) rejected, but if ereport doesn't return then the PostGreSQL issue is not the same as that rejected GCC bug report. 它提到了一个GCC错误报告http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29968被(正确的IMO)拒绝,但如果ereport没有返回那么PostGreSQL问题与那个不一样拒绝GCC错误报告。

In the debian bug description is the following: 在debian bug中描述如下:

The gcc guys are full of it. gcc的家伙们充满了它。 The issue that is relevant here is the C standard's definition of sequence points, and in particular the requirement that visible side effects of a later statement cannot happen before the execution of an earlier function call. 这里相关的问题是C标准对序列点的定义,特别是要求在执行早期函数调用之前不能发生后面语句的可见副作用。 The last time I pestered them about this, I got some lame claim that a SIGFPE wasn't a side effect within the definitions of the spec. 我最后一次纠缠他们,我得到了一些蹩脚的声称,SIGFPE并不是规范定义中的副作用。 At that point useful discussion stopped, because it's impossible to negotiate with someone who's willing to claim that. 在那时,有用的讨论停止了,因为不可能与愿意宣称这一点的人谈判。

In point of fact, if a later statement has UB then it is explicitly stated in the standard that the whole program has UB. 事实上,如果后来的声明有UB,则标准明确规定整个程序具有UB。 Ben has the citation in his answer. Ben在答案中有引用。 It is not the case (as this person seems to think) that all visible side effects must occur up to the last sequence point before the UB. 事实并非如此(因为这个人似乎认为)所有可见的副作用必须发生在UB之前的最后一个序列点。 UB permits inventing a time machine (and more prosaically, it permits out of order execution that assumes everything executed has defined behaviour). UB允许发明一个时间机器(更多的是,它允许乱序执行,假设所执行的一切都有定义的行为)。 The gcc guys are not full of it if that's all they say. 如果这就是他们所说的话,gcc的家伙并不满足于此。

A SIGFPE would be a visible side effect if the compiler chooses to guarantee and document (as an extension to the standard) that it occurs, but if it's just the result of UB then it is not. 如果编译器选择保证并记录(作为标准的扩展)它发生的SIGFPE将是一个可见的副作用,但如果它只是UB的结果,那么它不是。 Compare for example the -fwrapv option to GCC, which changes integer overflow from UB (what the standard says) to wrap-around (which the compiler guarantees, only if you specify the option ). 比较例如GCC的-fwrapv选项,它将UB(标准所说的)的整数溢出更改为环绕(编译器保证, 只有在指定选项时 )。 On MIPS, gcc has an option -mcheck-zero-division , which looks like it does define behaviour on division by zero, but I've never used it. 在MIPS上,gcc有一个选项-mcheck-zero-division ,看起来它确定了除以零的行为,但我从未使用它。

It's possible that the authors of the paper noticed the wrongness of that complaint against GCC, and the thought that one of the PostGreSQL authors was wrong in this way influenced them when they put the snigger quotes into: 本文的作者可能会注意到针对GCC的投诉是错误的,并且认为其中一个PostGreSQL作者错误的方式会影响他们,因为他们将窃笑引号放入:

We found seven similar issues in PostgreSQL, which were noted as “GCC bugs” in source code comments 我们在PostgreSQL中发现了七个类似的问题,在源代码注释中被称为“GCC错误”

But a function not returning is very different from a function returning after some side effects. 但是一个不返回的函数与一些副作用后返回的函数非常不同。 If it doesn't return, the statement that would have UB is not executed within the definition of the C (or C++) abstract machine in the standard. 如果它没有返回,那么具有UB的语句不会在标准中的C(或C ++)抽象机器的定义中执行 Unreached statements aren't executed: I hope this isn't contentious. 未执行的语句不会执行:我希望这不是有争议的。 So if the "gcc guys" were to claim that UB from unreached statements renders the whole program undefined, then they'd be full of it. 因此,如果“gcc guys”声称未完成的语句中的UB使得整个程序未定义, 那么他们就会充满它。 I don't know that they have claimed that, and at the end of the Debian report there's a suggestion that the issue might have gone away by GCC 4.4. 我不知道他们声称这一点,并且在Debian报告的最后,有人建议GCC 4.4可能已经消除了这个问题。 If so then perhaps PostGreSQL indeed had encountered an eventually-acknowledged bug, not (as the author of the paper you link to thinks) a valid optimization or (as the person who says the gcc guys are full of it thinks) a misinterpretation of the standard by GCC's authors. 如果是这样,那么也许PostGreSQL确实遇到了一个最终被承认的错误,而不是(正如你所链接的论文的作者所认为的那样)一个有效的优化,或者(正如那个说gcc家伙充满它的人所想的那样)对这个错误的解释。 GCC作者的标准。

I think the answer is found, at least for C++, in section 1.9p5 我想在第1.9p5节中找到了答案,至少对于C ++来说

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. 执行格式良好的程序的一致实现应该产生与具有相同程序和相同输入的抽象机的相应实例的可能执行之一相同的可观察行为。 However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input ( not even with regard to operations preceding the first undefined operation ). 但是,如果任何此类执行包含未定义的操作 ,则此国际标准不要求使用该输入执行该程序的实现( 甚至不考虑第一个未定义操作之前的操作 )。

In fact, the macro expands to a call to errstart which will return (ERROR >= ERROR) , obviously true. 事实上,宏扩展为对errstart的调用,它将返回(ERROR >= ERROR) ,显然是真的。 That triggers a call to errfinish which calls proc_exit which runs some registered cleanup and then the Standard runtime function exit . 这会触发对errfinish的调用,调用proc_exit运行一些已注册的清理,然后运行标准运行时函数exit So there is no possible execution that contains a divide-by-zero. 因此,没有可能的执行包含被零除。 However, the compiler logic testing this must have gotten it wrong. 但是,编译逻辑测试这一定必须弄错。 Or perhaps an earlier version of the code failed to properly exit. 或者可能是早期版本的代码未能正确退出。

It seems to me that unless the compiler can prove that ereport() doesn't call exit() or abort() or some other mechanism for program termination then this optimization is invalid. 在我看来,除非编译器能够证明ereport()不调用exit()abort()或其他一些程序终止机制,否则这种优化是无效的。 The language standard mentions several mechanisms for termination, and even defines the 'normal' program termination via returning from main() in terms of the exit() function. 语言标准提到了几种终止机制,甚至通过从exit()函数返回main()来定义“正常”程序终止。

Not to mention that program termination isn't necessary to avoid the division expression. 更不用说程序终止不是避免除法表达式所必需的。 for (;;) {} is perfectly valid C. for (;;) {}完全有效C.

不,在最新的C标准C11中,甚至有一个新的关键字来指定函数不会返回, _Noreturn

The paper does not say that the if (arg2 == 0) check is removed. 本文说, if (arg2 == 0)检查被删除。 It says that the division is moved before the check . 它说在检查之前移动了分部。

Quoting the paper : 引用论文

... GCC moves the division before the zero check arg2 == 0 , causing division by zero. ...... GCC在零检查之前移动除法arg2 == 0 ,导致除以零。

The result is the same, but the reasoning is different. 结果是一样的,但推理是不同的。

If the compiler believes ereport will return, then it "knows" that the division will be performed in all cases. 如果编译器认为ereport将返回,那么它“知道”将在所有情况下执行除法。 Furthermore, the if-statement doesn't affect the arguments of the division. 此外,if语句不会影响除法的参数。 And obviously, the division doesn't affect the if-statement. 显然,除法不影响if语句。 And while call to ereport might have observable side effects, the division does not (if we ignore any divide-by-zero exception). 虽然对ereport调用可能具有可观察到的副作用,但是除法不会(如果我们忽略任何被零除的异常)。

Therefore, the compiler believes the as-if rule gives it the freedom to reorder these statements with respect to each other--it can move the division before the test because the observable behavior should be identical (for all of the cases that yield defined behavior). 因此,编译器认为as-if规则赋予它相互重新排序这些语句的自由 - 它可以在测试之前移动除法,因为可观察行为应该是相同的(对于产生定义行为的所有情况) )。

One way to look at it is that undefined behavior includes time travel. 一种看待它的方法是未定义的行为包括时间旅行。 ;-) ;-)

I'd argue that undefined behavior (eg, dividing by 0), should be considered observable behavior. 我认为未定义的行为(例如,除以0)应该被视为可观察的行为。 That would prevent this reordering because the observable behavior of the division must happen after the observable behavior of the call to ereport . 这将阻止这种重新排序,因为除法的可观察行为必须在对ereport的调用的可观察行为之后发生。 But I don't write standards or compilers. 但我不写标准或编译器。

In embedded systems, functions that never return are commonplace. 在嵌入式系统中,永不返回的功能是司空见惯的。 They should not be optimized either. 它们也不应该优化。

For example, a common algorithm is to have a forever loop in main() (aka the background loop), and all functionality takes place in an ISR (Interrupt Service Routine). 例如,常见的算法是在main() (也称为后台循环)中具有永久循环,并且所有功能都在ISR(中断服务程序)中进行。

Another example are RTOS tasks. 另一个例子是RTOS任务。 In our embedded system project, we have tasks that are in an infinte loop: Pend on message queue, process message, repeat. 在我们的嵌入式系统项目中,我们有一个infinte循环中的任务:Pend on message queue,process message,repeat。 They will do this for the life of the project. 他们将在项目的整个生命周期中这样做。

Some embedded systems have safe shutdown loops where they place the machine into a safe state, locking out all User Input, and wait for power shutdown or reset. 某些嵌入式系统具有安全关闭环路,可将机器置于安全状态,锁定所有用户输入,并等待电源关闭或复位。

Also, some embedded systems can shutdown the system. 此外,某些嵌入式系统可能会关闭系统。 Shutting down the power prevents the system from returning. 关闭电源可防止系统返回。

There are reasons that not all functions need to return or must be required to return. 有理由不是所有功能都需要返回或必须返回。 If all functions returned that are in your cell phone, you wouldn't be fast enough to use it. 如果手机中的所有功能都返回,那么您使用它的速度就不够快。

Most functions are assumed to eventually return. 假设大多数函数最终返回。 There are compiler-specific extensions in some compilers to inform the compiler that a function will never return. 某些编译器中有特定于编译器的扩展,以通知编译器函数永远不会返回。

__attribute__ ((noreturn)) does this for gcc. __attribute__ ((noreturn))为gcc做这个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM