简体   繁体   中英

Expression evaluation in C vs Java

int y=3;
int z=(--y) + (y=10);

when executed in C language the value of z evaluates to 20 but when the same expression in java, when executed gives the z value as 12.

Can anyone explain why this is happening and what is the difference?

when executed in C language the value of z evaluates to 20

No it does not. This is undefined behavior, so z could get any value. Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior. Read more here: Undefined, unspecified and implementation-defined behavior

As a rule of thumb, never modify a variable twice in the same expression.

It's not a good duplicate, but this will explain things a bit deeper. The reason for undefined behavior here is sequence points. Why are these constructs using pre and post-increment undefined behavior?

In C, when it comes to arithmetic operators, like + and / , the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable. Here is an example:

int foo(void)
{
    printf("foo()\n");
    return 0;
}

int bar(void)
{
    printf("bar()\n");
    return 0;
}

int main(void)
{
    int x = foo() + bar();
}

What will this program print? Well, we don't know. I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable. I made a question, Is it undefined behavior to use functions with side effects in an unspecified order? , about that, so I'll update this answer later.

Some other variables have specified order (left to right) of evaluation, like || and && and this feature is used for short circuiting . For instance, if we use the above example functions and use foo() && bar() , only the foo() function will be executed.

I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations. Almost everything in Java is well defined. For more details, read rzwitserloot's answer

There are 3 parts to this answer:

  1. How this works in C (unspecified behaviour)
  2. How this works in Java (the spec is clear on how this should be evaluated)
  3. Why is there a difference.

For #1, you should read @klutt's fantastic answer.

For #2 and #3, you should read this answer.

How does it work in java?

Unlike in C, java's language specification is far more clearly specified. For example, C doesn't even tell you how many bits the data type int is supposed to have, whereas the java lang spec does: 32 bits. Even on 64-bit processors and a 64-bit java implementation.

The java spec clearly says that x+y is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10 is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10 is evaluated which is clearly 12.

Obviously, a language like java is just better; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?

The answer is: performance.

In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU. A 2-step model.

In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU. A 3-step model.

If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.

So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code. This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.

In java, you get a second step, and that's where java does its optimizations: At runtime. java.exe does it to class files; javac.exe is quite 'stupid' and optimizes almost nothing. This is on purpose; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.

So java code is never undefined behaviour?

Not so. Java has a memory model which includes a ton of undefined behaviour:

class X { int a, b; }
X instance = new X();

new Thread() { public void run() {
    int a = instance.a;
    int b = instance.b;
    instance.a = 5;
    instance.b = 6;
    System.out.print(a);
    System.out.print(b);
}}.start();

new Thread() { public void run() {
    int a = instance.a;
    int b = instance.b;
    instance.a = 1;
    instance.b = 2;
    System.out.print(a);
    System.out.print(b);
}}.start();

is undefined in java. It may print 0056 , 0012 , 0010 , 0002 , 5600 , 0600 , and many many more possibilities. Something like 5000 (which it could legally print) is hard to imagine: How can the read of a 'work' but the read of b then fail?

For the exact same reason your C code produces arbitrary answers:

Optimization.

The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization. So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using eg synchronized .

When executed in C language the value of z evaluates to 20

It is not the truth. The compiler you use evaluates it to 20 . Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh

This kind of behaviour is called Undefined Behaviour.

In your expression you have two problems.

  1. Order of eveluation (except the logical expressions) is not specified in C (it is an Unspecified Behaviour)
  2. In this expression there is also problem with the sequence point (Undefined Bahaviour)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM