简体   繁体   中英

Is increment/decrement operators Undefined Behaviour in Bash?

In standards C99 and C11 an expression like the following is UB (Undefined Behaviour):

 int x = 2;
 int ans = x++  +  x++;  

In Bash the increment/decrement operators are defined and the official document in gnu.org says that the conventions of standard C are followed.

In addition, Bash is mostly POSIX conforming and in its standard document ( http://pubs.opengroup.org/onlinepubs/9699919799/ ) is said that, for arithmetical operations, C standard is assumed unless the contrary is said.

Since I cannot find more information, my conclusion is that in Bash we also have Undefined Behaviour with increment operators:

x = 2
echo $(( x++ + x++ ))

I need to be sure if my conclusion is right or, on the contrary, if there exists some convention in Bash that supersedes the C standard.

Additional note: Trying in my system (Ubuntu 14.04, Bash version 4.3.11) it seems that left to right evaluation is performed, with an increment that is immediately done where the operator ++ appears.

Looking at the bash code (bash 4.3), source expr.c , I see the following:

      /* post-increment or post-decrement */
      if (stok == POSTINC || stok == POSTDEC)
        {
          /* restore certain portions of EC */
          tokstr = ec.tokstr;
          noeval = ec.noeval;
          curlval = ec.lval;
          lasttok = STR;    /* ec.curtok */

          v2 = val + ((stok == POSTINC) ? 1 : -1);
          vincdec = itos (v2);
          if (noeval == 0)
        {
#if defined (ARRAY_VARS)
          if (curlval.ind != -1)
            expr_bind_array_element (curlval.tokstr, curlval.ind, vincdec);
          else
#endif
            expr_bind_variable (tokstr, vincdec);
        }
          free (vincdec);
          curtok = NUM; /* make sure x++=7 is flagged as an error */
        }

As you can see, the post increment is not implemented with the C post-increment:

v2 = val + ((stok == POSTINC) ? 1 : -1);

IMHO, with that code, and the fact that a line is processed token by token from left to right, I could say the behaviour is well defined.

You seem to be seeking to find a defined standard for bash, as there is for C. Unfortunately, there isn't one.

There are really only two guides:

  1. Volume 3 (Shell and Utilities) of the Open Group Base Specifications, commonly known as "Posix", which is deliberately underspecified. It does state that arithmetic evaluation "be equivalent to that described in Section 6.5, Expressions, of the ISO C standard," but that standard does not specify any value for an expression which contains both a mutator and another use of the same variable (such as x + x++ or x++ + x++ ).

  2. The actual behaviour of and manual for a particular version of bash, neither of which qualify as a formal specification, and neither of which are in any way certified or blessed by any standards organization.

Consequently, I would have to say that no document defines the result in bash of arithmetic evaluation of an expression like x++ + x++ . Even if the bash manual for the current bash version specified it (which it doesn't), and even if it were possible to deduce the behaviour from examination of the source code of the current bash version (which it is, but not necessarily easily), that would not be in any sense of the word a formal specification.

That makes the result "undefined" in the intuitive sense of the word: no result is defined.

There is certainly no law requiring that every programming language be fully specified, and many are not. Indeed, while C and C++ both enjoy exhaustive definitions in the form of ISO standards, there are many usages which the standards deliberately do not define, partly because doing so would require forms of error-detection which would impede performance. These decisions have been and will continue to be controversial, and I have no intention of taking sides.

I will simply observe that in the context of a formal specification, the following are not the same:

  • The requirement that the result of evaluating a particular program construct be detected as an error .

  • The explicit statement that the value of a particular construct is implementation-defined .

  • The failure to specify the value of a particular construct, or the explicit statement that it is unspecified.

  • The explicit statement that the result of evaluating a particular construct is undefined .

The first two of these are definitely formal specifications, in that they imply that the result of an evaluation is well-defined (in the second case, the definition should/must appear in the implementation manual). Such constructs are definitely usable, although of course the implementation-specific constructs will render the program non-portable.

The third does not accurately define the value of the construct it describes, although it will often provide a list of possibilities. The fourth, which is a C/C++ specialty, specifies that the construct is not valid and that it is the programmer's responsibility to avoid it because the standard imposes no requirements whatsoever on an implementation. Such constructs should never be used.

Examples of the four cases taken from specific specifications:

  • Error detection. (Java) "if the value of the divisor in an integer division is 0, then an ArithmeticException is thrown."

  • Implementation-specific. (Posix shell) "Open files are represented by decimal numbers starting with zero. The largest possible value is implementation-defined ; however, all implementations shall support at least 0 to 9, inclusive, for use by the application."

  • Unspecified behaviour. (C/C++) "An example of unspecified behavior is the order in which the arguments to a function are evaluated." (from the definitions section of the C99 standard).

  • Undefined behaviour (C) "In both operations [ / and % ], if the value of the second operand is zero, the behavior is undefined." (Contrast with Java, above.)

The last two categories seem to trigger a kind of outrage amongst certain programmers; disbelief that it is possible that a specification leave behaviour unspecified, and even that the absence of a specification is some kind of conspiracy to hide the truth (which must, therefore, be set free). And this in turn leads to random experimentation with particular language implementations, which must be futile precisely because the standard does not bind all implementations to do the same thing, or even for a particular implementation to consistently do the same thing.

It's also important to avoid thinking of "undefined behaviour" as a specific behaviour. If a computation with UB had a specific behaviour, it wouldn't be undefined . Even if the computation had a range of possible specific behaviours, it would be merely unspecified.

"Undefined behaviour" is not a specification or an attribute. You cannot detect "undefined behaviour" because the lack of definition means that any behaviour is possible, including behaviour which is the defined result of some other construct.

In particular, "undefined behaviour" is not the same as a detected error because the implementation is under no obligation to detect it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM