[英]Expression evaluation in C vs Java
int y=3;
int z=(--y) + (y=10);
when executed in C language the value of z
evaluates to 20 but when the same expression in java, when executed gives the z
value as 12.在 C 语言中执行时,
z
值为 20,但在 java 中执行相同的表达式时, z
值为 12。
Can anyone explain why this is happening and what is the difference?谁能解释为什么会发生这种情况以及有什么区别?
when executed in C language the value of z evaluates to 20
在 C 语言中执行时,z 的值为 20
No it does not.不,不是的。 This is undefined behavior, so
z
could get any value.这是未定义的行为,因此
z
可以获得任何值。 Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior.包括20。程序理论上也可以做任何事情,因为标准没有说明程序遇到未定义行为时应该做什么。 Read more here: Undefined, unspecified and implementation-defined behavior
在此处阅读更多信息:未定义、未指定和实现定义的行为
As a rule of thumb, never modify a variable twice in the same expression.根据经验,不要在同一个表达式中两次修改变量。
It's not a good duplicate, but this will explain things a bit deeper.这不是一个好的副本,但这会更深入地解释事情。 The reason for undefined behavior here is sequence points.
此处未定义行为的原因是序列点。 Why are these constructs using pre and post-increment undefined behavior?
为什么这些构造使用前后增量未定义行为?
In C, when it comes to arithmetic operators, like +
and /
, the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable.在 C 中,当涉及算术运算符时,例如
+
和/
,标准中未指定操作数的求值顺序,因此如果对这些求值有副作用,您的程序将变得不可预测。 Here is an example:下面是一个例子:
int foo(void)
{
printf("foo()\n");
return 0;
}
int bar(void)
{
printf("bar()\n");
return 0;
}
int main(void)
{
int x = foo() + bar();
}
What will this program print?这个程序会打印什么? Well, we don't know.
好吧,我们不知道。 I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable.
我不完全确定这个片段是否会调用未定义的行为,但无论如何,输出是不可预测的。 I made a question, Is it undefined behavior to use functions with side effects in an unspecified order?
我提出了一个问题,以未指定的顺序使用具有副作用的函数是否是未定义的行为? , about that, so I'll update this answer later.
,关于那个,所以我稍后会更新这个答案。
Some other variables have specified order (left to right) of evaluation, like ||
其他一些变量具有指定的评估顺序(从左到右),例如
||
and &&
and this feature is used for short circuiting .和
&&
并且此功能用于短路。 For instance, if we use the above example functions and use foo() && bar()
, only the foo()
function will be executed.例如,如果我们使用上面的示例函数并使用
foo() && bar()
,则只会执行foo()
函数。
I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations.我对Java不是很精通,但为了完整性,我想提一下,除了非常特殊的情况,Java基本上没有未定义或未指定的行为。 Almost everything in Java is well defined.
Java 中的几乎所有内容都定义良好。 For more details, read rzwitserloot's answer
有关更多详细信息,请阅读rzwitserloot 的回答
There are 3 parts to this answer:这个答案有 3 个部分:
For #1, you should read @klutt's fantastic answer.对于#1,您应该阅读@klutt 的精彩回答。
For #2 and #3, you should read this answer.对于#2 和#3,您应该阅读此答案。
Unlike in C, java's language specification is far more clearly specified.与 C 不同,java 的语言规范被更明确地指定。 For example, C doesn't even tell you how many bits the data type
int
is supposed to have, whereas the java lang spec does: 32 bits.例如,C 甚至没有告诉您数据类型
int
应该有多少位,而 java lang 规范则告诉您:32 位。 Even on 64-bit processors and a 64-bit java implementation.即使在 64 位处理器和 64 位 java 实现上。
The java spec clearly says that x+y
is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y
is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10
is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10
is evaluated which is clearly 12. Java 规范清楚地表明
x+y
是从左到右求值的(相对于 C 的“按你喜欢的任何顺序,编译器”),因此,首先--y
被求值,这显然是 2(侧面- y 2 的效果),然后评估y=10
显然是 10(带有使 y 10 的副作用),然后评估2+10
显然是 12。
Obviously, a language like java is just better;显然,像java这样的语言更好; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?
毕竟,根据定义,未定义的行为几乎是一个错误,C lang 规范编写者引入这些疯狂的东西有什么问题吗?
The answer is: performance.答案是:性能。
In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU.在 C 中,您的源代码由编译器转换为机器代码,然后由 CPU 解释机器代码。 A 2-step model.
一个两步模型。
In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU.在java中,你的源代码被编译器转换成字节码,字节码然后被运行时转换成机器码,然后机器码被CPU解释。 A 3-step model.
一个 3 步模型。
If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.如果要引入优化,则无法控制 CPU 做什么,因此对于 C,可以完成的步骤只有 1 个:编译。
So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code.因此,C(语言)旨在为 C 编译器提供大量自由,以尝试生成优化的机器代码。 This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.
这是一个成本/收益方案:以在 lang 规范中有大量“未定义行为”为代价,您可以获得更好的优化编译器的好处。
In java, you get a second step, and that's where java does its optimizations: At runtime.在 Java 中,您有第二步,这就是 Java 进行优化的地方:在运行时。
java.exe
does it to class files; java.exe
对类文件进行处理; javac.exe
is quite 'stupid' and optimizes almost nothing. javac.exe
非常“愚蠢”,几乎没有优化。 This is on purpose;这是故意的; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.
在运行时,您可以做得更好(例如,您可以使用一些簿记来跟踪两个分支中的哪一个更常被采用,从而分支预测比 C 应用程序更好) - 这也意味着成本/收益分析现在产生in:lang 规范应该是清晰的。
Not so.不是这样。 Java has a memory model which includes a ton of undefined behaviour:
Java 有一个内存模型,其中包含大量未定义的行为:
class X { int a, b; }
X instance = new X();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 5;
instance.b = 6;
System.out.print(a);
System.out.print(b);
}}.start();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 1;
instance.b = 2;
System.out.print(a);
System.out.print(b);
}}.start();
is undefined in java.在java中是未定义的。 It may print
0056
, 0012
, 0010
, 0002
, 5600
, 0600
, and many many more possibilities.它可能会打印
0056
、 0012
、 0010
、 0002
、 5600
、 0600
以及更多的可能性。 Something like 5000
(which it could legally print) is hard to imagine: How can the read of a
'work' but the read of b
then fail?像
5000
(它可以合法打印)这样的东西很难想象:读取a
'work' 而读取b
怎么会失败?
For the exact same reason your C code produces arbitrary answers:出于完全相同的原因,您的 C 代码会产生任意答案:
Optimization.优化。
The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization.规范中“硬编码”的成本/收益正是此代码的行为方式,这将带来很大的成本:您将占用大部分优化空间。 So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using eg
synchronized
.因此,java 付出了代价,现在有了一个 langspec,每当您修改/读取来自不同线程的相同字段时,它都是不明确的,而无需使用例如
synchronized
建立所谓的“先来”保护。
When executed in C language the value of z evaluates to 20
在 C 语言中执行时,z 的值为 20
It is not the truth.这不是事实。 The compiler you use evaluates it to
20
.您使用的编译器将其计算为
20
。 Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh另一个可以以完全不同的方式对其进行评估: https : //godbolt.org/z/GcPsKh
This kind of behaviour is called Undefined Behaviour.这种行为称为未定义行为。
In your expression you have two problems.你的表达有两个问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.