简体   繁体   English

C 中数组索引(相对于表达式)的求值顺序

[英]Order of evaluation of array indices (versus the expression) in C

Looking at this code:看看这段代码:

static int global_var = 0;

int update_three(int val)
{
    global_var = val;
    return 3;
}

int main()
{
    int arr[5];
    arr[global_var] = update_three(2);
}

Which array entry gets updated?哪个数组条目得到更新? 0 or 2? 0 还是 2?

Is there a part in the specification of C that indicates the precedence of operation in this particular case? C 的规范中是否有部分指示在这种特殊情况下操作的优先级?

Order of Left and Right Operands左右操作数的顺序

To perform the assignment in arr[global_var] = update_three(2) , the C implementation must evaluate the operands and, as a side effect, update the stored value of the left operand.要在arr[global_var] = update_three(2)执行赋值,C 实现必须评估操作数,并且作为副作用,更新左操作数的存储值。 C 2018 6.5.16 (which is about assignments) paragraph 3 tells us there is no sequencing in the left and right operands: C 2018 6.5.16(关于赋值)第 3 段告诉我们左右操作数没有排序:

The evaluations of the operands are unsequenced.操作数的评估是无序的。

This means the C implementation is free to compute the lvalue arr[global_var] first (by “computing the lvalue,” we mean figuring out what this expression refers to), then to evaluate update_three(2) , and finally to assign the value of the latter to the former;这意味着 C 实现可以自由地首先计算左值arr[global_var] (通过“计算左值”,我们的意思是弄清楚这个表达式所指的是什么),然后计算update_three(2) ,最后分配后者对前者; or to evaluate update_three(2) first, then compute the lvalue, then assign the former to the latter;或者先评估update_three(2) ,然后计算左值,然后将前者分配给后者; or to evaluate the lvalue and update_three(2) in some intermixed fashion and then assign the right value to the left lvalue.或者以某种混合方式评估左值和update_three(2) ,然后将右值分配给左左值。

In all cases, the assignment of the value to the lvalue must come last, because 6.5.16 3 also says:在所有情况下,将值分配给左值必须放在最后,因为 6.5.16 3 还说:

… The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands… … 更新左操作数的存储值的副作用在左右操作数的值计算之后排序…

Sequencing Violation测序违规

Some might ponder about undefined behavior due to both using global_var and separately updating it in violation of 6.5 2, which says:由于同时使用global_var和单独更新它,有些人可能会考虑未定义的行为,这违反了 6.5 2,它说:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined…如果对标量对象的副作用相对于对同一标量对象的不同副作用或使用同一标量对象的值进行的值计算而言是未排序的,则行为是未定义的……

It is quite familiar to many C practitioners that the behavior of expressions such as x + x++ is not defined by the C standard because they both use the value of x and separately modify it in the same expression without sequencing.许多 C 语言从业者都非常熟悉x + x++等表达式的行为并没有在 C 标准中定义,因为它们都使用x的值并且在同一个表达式中单独修改它而没有排序。 However, in this case, we have a function call, which provides some sequencing.但是,在这种情况下,我们有一个函数调用,它提供了一些排序。 global_var is used in arr[global_var] and is updated in the function call update_three(2) . global_vararr[global_var]使用,并在函数调用update_three(2)

6.5.2.2 10 tells us there is a sequence point before the function is called: 6.5.2.2 10 告诉我们在函数调用之前有一个序列点:

There is a sequence point after the evaluations of the function designator and the actual arguments but before the actual call…在函数指示符和实际参数的计算之后但在实际调用之前有一个序列点......

Inside the function, global_var = val;在函数内部, global_var = val; is a full expression , and so is the 3 in return 3;是一个完整的表达,因此是3return 3; , per 6.8 4: , 每 6.8 4:

A full expression is an expression that is not part of another expression, nor part of a declarator or abstract declarator…完整表达式是不属于另一个表达式的一部分,也不属于声明符或抽象声明符的表达式......

Then there is a sequence point between these two expressions, again per 6.8 4:然后在这两个表达式之间有一个序列点,再次按照 6.8 4:

… There is a sequence point between the evaluation of a full expression and the evaluation of the next full expression to be evaluated. … 在对完整表达式的求值和对下一个要求值的完整表达式的求值之间有一个序列点。

Thus, the C implementation may evaluate arr[global_var] first and then do the function call, in which case there is a sequence point between them because there is one before the function call, or it may evaluate global_var = val;因此,C 实现可能会先评估arr[global_var] ,然后进行函数调用,在这种情况下,它们之间存在一个序列点,因为在函数调用之前有一个序列点,或者它可能评估global_var = val; in the function call and then arr[global_var] , in which case there is a sequence point between them because there is one after the full expression.在函数调用中,然后是arr[global_var] ,在这种情况下,它们之间有一个序列点,因为在完整表达式之后有一个。 So the behavior is unspecified—either of those two things may be evaluated first—but it is not undefined.所以行为是未指定的——这两个东西中的任何一个都可能首先被评估——但它不是未定义的。

The result here is unspecified .这里的结果是不确定的

While the order of operations in an expression, which dictate how subexpressions are grouped, is well defined, the order of evaluation is not specified.虽然决定子表达式如何分组的表达式中的操作顺序已明确定义,但未指定值顺序。 In this case it means that either global_var could be read first or the call to update_three could happen first, but there's no way to know which.在这种情况下,这意味着可以先读取global_var或首先调用update_three ,但无法知道哪个。

There is not undefined behavior here because a function call introduces a sequence point, as does every statement in the function including the one that modifies global_var .这里没有未定义的行为,因为函数调用引入了一个序列点,函数中的每个语句也是如此,包括修改global_var

To clarify, the C standard defines undefined behavior in section 3.4.3 as:为了澄清起见, C 标准将第 3.4.3 节中的未定义行为定义为:

undefined behavior未定义的行为

behavior, upon use of a nonportable or erroneous program construct or of erroneous data,for which this International Standard imposes no requirements在使用不可移植的或错误的程序结构或错误数据时的行为,本国际标准对此不作任何要求

and defines unspecified behavior in section 3.4.4 as:并将第 3.4.4 节中未指定的行为定义为:

unspecified behavior未指明的行为

use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance使用未指定的值,或本国际标准提供两种或多种可能性的其他行为,并且在任何情况下都没有对选择的进一步要求

The standard states that the evaluation order of function arguments is unspecified, which in this case means that either arr[0] gets set to 3 or arr[2] gets set to 3.标准规定函数参数的计算顺序是未指定的,在这种情况下,这意味着arr[0]被设置为 3 或arr[2]被设置为 3。

I tried and I got the entry 0 updated.我试过了,我更新了条目 0。

However according to this question: will right hand side of an expression always evaluated first然而,根据这个问题: 表达式的右侧是否总是首先计算

The order of evaluation is unspecified and unsequenced.评估的顺序是未指定和未排序的。 So I think a code like this should be avoided.所以我认为应该避免这样的代码。

As it makes little sense to emit code for an assignment before you have a value to assign, most C compilers will first emit code that calls the function and save the result somewhere (register, stack, etc.), then they will emit code that writes this value to its final destination and therefore they will read the global variable after it has been changed.由于在分配值之前发出赋值代码毫无意义,因此大多数 C 编译器会首先发出调用函数并将结果保存在某处(寄存器、堆栈等)的代码,然后它们会发出代码将此值写入其最终目的地,因此他们将在更改后读取全局变量。 Let us call this the "natural order", not defined by any standard but by pure logic.让我们称之为“自然秩序”,它不是由任何标准定义的,而是由纯逻辑定义的。

Yet in the process of optimization, compilers will try to eliminate the intermediate step of temporarily storing the value somewhere and try to write the function result as directly as possible to the final destination and in that case, they often will have to read the index first, eg to a register, to be able to directly move the function result to the array.但是在优化的过程中,编译器会尽量去掉将值临时存储在某处的中间步骤,并尝试将函数结果尽可能直接写入最终目的地,在这种情况下,他们往往不得不先读取索引,例如到寄存器,以便能够直接将函数结果移动到数组中。 This may cause the global variable to be read before it was changed.这可能会导致全局变量在更改之前被读取。

So this is basically undefined behavior with the very bad property that its quite likely that the result will be different, depending on if optimization is performed and how aggressive this optimization is.所以这基本上是未定义的行为,具有非常糟糕的属性,结果很可能会有所不同,这取决于是否执行优化以及这种优化的积极程度。 It's your task as a developer to resolve that issue by either coding:作为开发人员,您的任务是通过以下任一编码解决该问题:

int idx = global_var;
arr[idx] = update_three(2);

or coding:或编码:

int temp = update_three(2);
arr[global_var] = temp;

As a good rule of the thumb: Unless global variables are const (or they are not but you know that no code will ever change them as a side effect), you should never use them directly in code, as in a multi-threaded environment, even this can be undefined:作为一个很好的经验法则:除非全局变量是const (或者它们不是,但你知道没有代码会改变它们作为副作用),你不应该直接在代码中使用它们,就像在多线程环境中一样,即使这可以是未定义的:

int result = global_var + (2 * global_var);
// Is not guaranteed to be equal to `3 * global_var`!

Since the compiler may read it twice and another thread can change the value in between the two reads.由于编译器可能会读取它两次并且另一个线程可以在两次读取之间更改值。 Yet, again, optimization would definitely cause the code to only read it once, so you may again have different results that now also depend on the timing of another thread.然而,再一次,优化肯定会导致代码只读取一次,所以你可能会再次得到不同的结果,这些结果现在也取决于另一个线程的时间。 Thus you will have a lot less headache if you store global variables to a temporary stack variable before usage.因此,如果您在使用前将全局变量存储到临时堆栈变量中,您将少很多麻烦。 Keep in mind if the compiler thinks this is safe, it will most likely optimize even that away and instead use the global variable directly, so in the end, it may make no difference in performance or memory use.请记住,如果编译器认为这是安全的,它很可能甚至会优化掉,而是直接使用全局变量,因此最终,它可能不会对性能或内存使用产生影响。

(Just in case someone asks why would anyone do x + 2 * x instead of 3 * x - on some CPUs addition is ultra-fast and so is multiplication by a power two as the compiler will turn these into bit shifts ( 2 * x == x << 1 ), yet multiplication with arbitrary numbers can be very slow, thus instead of multiplying by 3, you get much faster code by bit shifting x by 1 and adding x to the result - and even that trick is performed by modern compilers if you multiply by 3 and turn on aggressive optimization unless it's a modern target CPU where multiplication is equally fast as addition since then the trick would slow down the calculation.) (以防万一有人问为什么有人会做x + 2 * x而不是3 * x - 在某些 CPU 上,加法速度非常快,乘以 2 的幂也是如此,因为编译器会将这些转换为位移( 2 * x == x << 1 ),但是与任意数字的乘法可能非常慢,因此不是乘以 3,而是通过将 x 位移 1 并将 x 添加到结果中获得更快的代码 - 甚至该技巧由现代编译器,如果您乘以 3 并打开积极优化,除非它是现代目标 CPU,其中乘法与加法一样快,因为此技巧会减慢计算速度。)

Global edit: sorry guys, I got all fired up and wrote a lot of nonsense.全球编辑:对不起,伙计们,我被激怒了,写了很多废话。 Just an old geezer ranting.只是一个老家伙咆哮。

I wanted to believe C had been spared, but alas since C11 it has been brought on par with C++.我想相信 C 已经幸免于难,但是自 C11 以来,它已与 C++ 相提并论。 Apparently, knowing what the compiler will do with side effects in expressions requires now to solve a little maths riddle involving a partial ordering of code sequences based on a "is located before the synchronization point of".显然,要知道编译器将如何处理表达式中的副作用,现在需要解决一个小数学谜语,该谜语涉及基于“位于同步点之前”的代码序列的部分排序。

I happen to have designed and implemented a few critical real-time embedded systems back in the K&R days (including the controller of an electric car that could send people crashing into the nearest wall if the engine was not kept in check, a 10 tons industrial robot that could squash people to a pulp if not properly commanded, and a system layer that, though harmless, would have a few dozen processors suck their data bus dry with less than 1% system overhead).在 K&R 时代,我碰巧设计并实现了一些关键的实时嵌入式系统(包括电动汽车的控制器,如果不检查发动机,它可能会让人们撞到最近的墙壁,一个 10 吨工业如果没有正确命令,可以将人压成一团浆糊的机器人,以及一个系统层,虽然无害,但会让几十个处理器以不到 1% 的系统开销吸干他们的数据总线)。

I might be too senile or stupid to get the difference between undefined and unspecified, but I think I still have a pretty good idea of what concurrent execution and data access mean.我可能太老了或太愚蠢而无法区分未定义和未指定之间的区别,但我认为我仍然很清楚并发执行和数据访问的含义。 In my arguably informed opinion, this obsession of the C++ and now C guys with their pet languages taking over synchronization issues is a costly pipe dream.在我可以说是明智的观点中,这种对 C++ 的痴迷以及现在用他们的宠物语言接管同步问题的 C 人是一个代价高昂的白日梦。 Either you know what concurrent execution is, and you don't need any of these gizmos, or you don't, and you would do the world at large a favour not trying to mess with it.要么您知道并发执行是什么,并且您不需要任何这些小玩意儿,要么您不需要,并且您会为整个世界提供帮助,而不是试图弄乱它。

All this truckload of eye-watering memory barrier abstractions is simply due to a temporary set of limitations of the multi-CPU cache systems, all of which can be safely encapsulated in common OS synchronization objects like, for instance, the mutexes and condition variables C++ offers.所有这些令人眼花缭乱的内存屏障抽象都只是由于多 CPU 缓存系统的一组临时限制,所有这些都可以安全地封装在常见的操作系统同步对象中,例如互斥锁和条件变量 C++优惠。
The cost of this encapsulation is but a minute drop in performances compared with what a use of fine grained specific CPU instructions could achieve is some cases.在某些情况下,与使用细粒度的特定 CPU 指令可以实现的性能相比,这种封装的成本只是性能的微小下降。
The volatile keyword (or a #pragma dont-mess-with-that-variable for all I, as a system programmer, care) would have been quite enough to tell the compiler to stop reordering memory accesses. volatile关键字(或#pragma dont-mess-with-that-variable对于所有我来说,作为系统程序员,关心)已经足以告诉编译器停止重新排序内存访问。 Optimal code can easily be produced with direct asm directives to sprinkle low level driver and OS code with ad hoc CPU specific instructions.可以使用直接的 asm 指令轻松生成最佳代码,以使用特定 CPU 的特定指令散布低级驱动程序和操作系统代码。 Without an intimate knowledge of how the underlying hardware (cache system or bus interface) works, you're bound to write useless, inefficient or faulty code anyway.如果不深入了解底层硬件(缓存系统或总线接口)的工作原理,无论如何您都一定会编写无用、低效或错误的代码。

A minute adjustment of the volatile keyword and Bob would have been everybody but the most hardboiled low level programers' uncle.volatile关键字和 Bob 稍作调整,除了最顽固的低级程序员的叔叔外,每个人都可以。 Instead of that, the usual gang of C++ maths freaks had a field day designing yet another incomprehensible abstraction, yielding to their typical tendency to design solutions looking for non existent problems and mistaking the definition of a programming language with the specs of a compiler.取而代之的是,通常的 C++ 数学怪胎在现场设计了另一个难以理解的抽象,屈服于他们设计解决方案的典型趋势,寻找不存在的问题,并将编程语言的定义误认为编译器的规范。

Only this time the change required to deface a fundamental aspect of C too, since these "barriers" had to be generated even in low level C code to work properly.只是这一次改变也需要破坏 C 的一个基本方面,因为即使在低级 C 代码中也必须生成这些“障碍”才能正常工作。 That, among other things, wrought havoc in the definition of expressions, with no explanation or justification whatsoever.除其他外,这对表达式的定义造成了严重破坏,没有任何解释或理由。

As a conclusion, the fact that a compiler could produce a consistent machine code from this absurd piece of C is only a distant consequence of the way C++ guys coped with potential inconsistencies of the cache systems of the late 2000s.总之,编译器可以从这个荒谬的 C 代码中生成一致的机器代码这一事实只是 C++ 人员处理 2000 年代后期缓存系统潜在不一致的方式的一个遥远的结果。
It made a terrible mess of one fundamental aspect of C (expression definition), so that the vast majority of C programmers - who don't give a damn about cache systems, and rightly so - is now forced to rely on gurus to explain the difference between a = b() + c() and a = b + c .它把 C 的一个基本方面(表达式定义)搞得一团糟,以至于绝大多数 C 程序员——他们不在乎缓存系统,这是正确的——现在被迫依赖大师来解释a = b() + c()a = b + c之间a = b() + c()区别。

Trying to guess what will become of this unfortunate array is a net loss of time and efforts anyway.无论如何,试图猜测这个不幸的阵列会变成什么样子都是浪费时间和精力。 Regardless of what the compiler will make of it, this code is pathologically wrong.不管编译器会怎么做,这段代码都是病态的。 The only responsible thing to do with it is send it to the bin.唯一负责任的做法是将其发送到垃圾箱。
Conceptually, side effects can always be moved out of expressions, with the trivial effort of explicitly letting the modification occur before or after the evaluation, in a separate statement.从概念上讲,副作用总是可以从表达式中移出,只需在单独的语句中显式地让修改发生在评估之前或之后。
This kind of shitty code might have been justified in the 80's, when you could not expect a compiler to optimize anything.这种糟糕的代码在 80 年代可能是合理的,当时你不能指望编译器优化任何东西。 But now that compilers have long become more clever than most programmers, all that remains is a piece of shitty code.但是现在编译器早已变得比大多数程序员更聪明,剩下的只是一段糟糕的代码。

I also fail to understand the importance of this undefined / unspecified debate.我也无法理解这场未定义/未指定辩论的重要性。 Either you can rely on the compiler to generate code with a consistent behaviour or you can't.您可以依靠编译器生成具有一致行为的代码,也可以不这样做。 Whether you call that undefined or unspecified seems like a moot point.您是否称其为未定义或未指定似乎是一个有争议的问题。

In my arguably informed opinion, C is already dangerous enough in its K&R state.在我可以说是明智的观点中,C 在其 K&R 状态下已经足够危险了。 A useful evolution would be to add common sense safety measures.一个有用的演变是添加常识性安全措施。 For instance, making use of this advanced code analysis tool the specs force the compiler to implement to at least generate warnings about bonkers code, instead of silently generating a code potentially unreliable to the extreme.例如,使用这种先进的代码分析工具,规范强制编译器实现至少生成关于疯子代码的警告,而不是默默地生成一个可能不可靠到极端的代码。
But instead the guys decided, for instance, to define a fixed evaluation order in C++17.但是他们决定,例如,在 C++17 中定义一个固定的评估顺序。 Now every software imbecile is actively incited to put side effects in his/her code on purpose, basking in the certainty that the new compilers will eagerly handle the obfuscation in a deterministic way.现在,每个软件白痴都被积极地煽动故意在他/她的代码中加入副作用,相信新编译器会以一种确定性的方式急切地处理混淆。

K&R was one of the true marvels of the computing world. K&R 是计算世界真正的奇迹之一。 For twenty bucks you got a comprehensive specification of the language (I've seen single individuals write complete compilers just using this book), an excellent reference manual (the table of contents would usually point you within a couple of pages of the answer to your question), and a textbook that would teach you to use the language in a sensible way.花 20 美元,你就得到了该语言的全面规范(我见过一个人只使用这本书编写了完整的编译器),一本优秀的参考手册(目录通常会在你的答案的几页内指出你问题),以及教你如何以合理的方式使用这门语言的教科书。 Complete with rationales, examples and wise words of warning about the numerous ways you could abuse the language to do very, very stupid things.完整的理由、例子和明智的警告,关于你可以滥用语言来做非常非常愚蠢的事情的多种方式。

Destroying that heritage for so little gain seems like a cruel waste to me.以微薄的收益摧毁遗产对我来说似乎是一种残酷的浪费。 But again I might very well fail to see the point completely.但同样,我很可能无法完全理解这一点。 Maybe some kind soul could point me in the direction of an example of new C code that takes a significant advantage of these side effects?也许某个好心人可以为我指出一个新的 C 代码示例的方向,该示例利用了这些副作用的显着优势?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM