简体   繁体   中英

Can volatile variables be read multiple times between sequence points?

I'm making my own C compiler to try to learn as much details as possible about C. I'm now trying to understand exactly how volatile objects work.

What is confusing is that, every read access in the code must strictly be executed (C11, 6.7.3p7):

An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine, as described in 5.1.2.3. Furthermore, at every sequence point the value last stored in the object shall agree with that prescribed by the abstract machine, except as modified by the unknown factors mentioned previously.134) What constitutes an access to an object that has volatile-qualified type is implementation-defined.

Example: in a = volatile_var - volatile_var; , the volatile variable must be read twice and thus the compiler can't optimise to a = 0;

At the same time, the order of evaluation between sequence point is undetermined (C11, 6.5p3):

The grouping of operators and operands is indicated by the syntax. Except as specified later, side effects and value computations of subexpressions are unsequenced.

Example: in b = (c + d) - (e + f) the order in which the additions are evaluated is unspecified as they are unsequenced.

But evaluations of unsequenced objects where this evaluation creates a side effect (with volatile for instance), the behaviour is undefined (C11, 6.5p2):

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

Does this mean the expressions like x = volatile_var - (volatile_var + volatile_var) is undefined? Should my compiler throw an warning if this occurs?

I've tried to see what CLANG and GCC do. Neither thow an error nor a warning. The outputed asm shows that the variables are NOT read in the execution order, but left to right instead as show in the asm risc-v asm below:

const int volatile thingy = 0;
int main()
{
    int new_thing = thingy - (thingy + thingy);
    return new_thing;
}
main:
        lui     a4,%hi(thingy)
        lw      a0,%lo(thingy)(a4)
        lw      a5,%lo(thingy)(a4)
        lw      a4,%lo(thingy)(a4)
        add     a5,a5,a4
        sub     a0,a0,a5
        ret

Edit: I am not asking "Why do compilers accept it", I am asking "Is it undefined behavior if we strictly follow the C11 standard". The standard seems to state that it is undefined behaviour, but I need more precision about it to correctly interpret that

Reading the (ISO 9899:2018) standard literally, then it is undefined behavior.

C17 5.1.2.3/2 - definition of side effects:

Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects

C17 6.5/2 - sequencing of operands:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.

Thus when reading the standard literally, volatile_var - volatile_var is definitely undefined behavior. Twice in a row UB actually, since both of the quoted sentences apply.


Please also note that this text changed quite a bit in C11. Previously C99 said, 6.5/2:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

That is, the behaviour was previously unspecified in C99 (unspecified order of evaluation) but was made undefined by the changes in C11.


That being said, other than re-ordering the evaluation as it pleases, a compiler doesn't really have any reason to do wild and crazy things with this expression since there isn't much that can be optimized, given volatile .

As a quality of implementation, mainstream compilers seem to maintain the previous "merely unspecified" behavior from C99.

Per C11, this is undefined behavior.

Per 5.1.2.3 Program execution , paragraph 2 (bolding mine):

Accessing a volatile object , modifying an object, modifying a file, or calling a function that does any of those operations are all side effects ...

And 6.5 Expressions , paragraph 2 (again, bolding mine):

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined .

Note that, as this is your compiler, you are free to define the behavior should you wish.

As other answers have pointed out, accessing a volatile -qualified variable is a side effect, and side effects are interesting, and having multiple side effects between sequence points is especially interesting, and having multiple side effects that affect the same object between sequence points is undefined.

As an example of how/why it's undefined, consider this (wrong) code for reading a two-byte big-endian value from an input stream ifs :

uint16_t val = (getc(ifs) << 8) | getc(ifs);     /* WRONG */

This code imagines (in order to implement big-endianness, that is) that the two getc calls happen in left-to-right order, but of course that's not at all guaranteed, which is why this code is wrong.

Now, one of the things the volatile qualifier is for is input registers. So if you've got a volatile variable

volatile uint8_t inputreg;

and if every time you read it you get the next byte coming in on some device — that is, if merely accessing the variable inputreg is like calling getc() on a stream — then you might write this code:

uint16_t val = (inputreg << 8) | inputreg;       /* ALSO WRONG */

and it's just about exactly as wrong as the getc() code above.

The Standard has no terminology more specific than "Undefined Behavior" to describe actions which should be unambiguously defined on some implementations, or even the vast majority of them, but may behave unpredictably on others, based upon Implementation-Defined criteria. If anything, the authors of the Standard go out of their way to avoid saying anything about such behaviors.

The term is also used as a catch-all for situations where a potentially useful optimization might observably affect program behavior in some cases, to ensure that such optimizations will not affect program behavior in any defined situations.

The Standard specifies that the semantics of volatile-qualified accesses are "Implementation Defined", and there are platforms where certain kinds of optimizations involving volatile -qualified accesses might be observable if more than one such access occurs between sequence points. As a simple example, some platforms have read-modify-write operations whose semantics may be observably distinct from doing discrete read, modify, and write operations. If a programmer were to write:

void x(int volatile *dest, int volatile *src)
{
  *dest = *src | 1;
}

and the two pointers were equal, the behavior of such a function might depend upon whether a compiler recognized that the pointers were equal and replaced discrete read and write operations with a combined read-modify-write.

To be sure, such distinctions would be unlikely to matter in most cases, and would be especially unlikely to matter in cases where an object is read twice. N.netheless, the Standard makes no attempt to distinguish situations where such optimizations would actually affect program behavior, much less those where they would affect program behavior in any way that actually mattered , from those where it would be impossible to detect the effects of such optimization. The notion that the phrase "non-portable or erroneous" excludes constructs which would be non-portable but correct on the target platform would lead to an interesting irony that compiler optimizations such as read-modify-write merging would be completely useless on any "correct" programs.

No diagnostic is required for programs with Undefined Behaviour, except where specifically mentioned. So it's not wrong to accept this code.

In general, it's not possible to know whether the same volatile storage is being accessed multiple times between sequence points (consider a function taking two volatile int* parameters, without restrict , as the simplest example where analysis is impossible).

That said, when you are able to detect a problematic situation, users might find it helpful, so I encourage you to work on getting a diagnostic out.

IMO it is legal but very bad.

    int new_thing = thingy - (thingy + thingy);

Multiple use of volatile variables in one expression is allowed and no warning is needed. But from the programmer's point of view, it is a very bad line of code.

Does this mean the expressions like x = volatile_var - (volatile_var + volatile_var) is undefined? Should my compiler throw an error if this occurs?

No as C standard does not say anything how those reads have to be ordered. It is left to the implementations. All known to me implementations do it the easiest way for them like in this example: https://godbolt.org/z/99498141d

I think this depends on the sequence of compilation with regards to the volatility string as it relates to the sequence of compilation, as inversely proportional to (C/B) as variables occur. Your experience may vary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM