简体   繁体   中英

C null pointer arithmetic

I noticed this warning from Clang:

warning: performing pointer arithmetic on a null pointer
has undefined behavior [-Wnull-pointer-arithmetic]

In details, it is this code which triggers this warning:

int *start = ((int*)0);
int *end = ((int*)0) + count;

The constant literal zero converted to any pointer type decays into the null pointer constant, which does not point to any contiguous area of memory but still has the type pointer to type needed to do pointer arithmetic.

Why would arithmetic on a null pointer be forbidden when doing the same on a non-null pointer obtained from an integer different than zero does not trigger any warning?

And more importantly, does the C standard explicitly forbid null pointer arithmetic ?


Also, this code will not trigger the warning, but this is because the pointer is not evaluated at compile time:

int *start = ((int*)0);
int *end = start + count;

But a good way of avoiding the undefined behavior is to explicitly cast an integer value to the pointer:

int *end = (int *)(sizeof(int) * count);

The C standard does not allow it.

6.5.6 Additive operators (emphasis mine)

8 When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object , and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and in-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined . If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

For the purposes of the above, a pointer to a single object is considered as pointing into an array of 1 element.

Now, ((uint8_t*)0) does not point at an element of an array object. Simply because a pointer holding a null pointer value does not point at any object . Which is said at:

6.3.2.3 Pointers

3 If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

So you can't do arithmetic on it. The warning is justified, because as the second highlighted sentence mentions, we are in the case of undefined behavior.

Don't be fooled by the fact the offsetof macro is possibly implemented like that. The standard library is not bound by the constraints placed on user programs. It can employ deeper knowledge. But doing this in our code is not well defined.

Little clarification on this thread.

First of all, this is undefined behavior per the C standard for the reasons cited by StoryTeller:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Since the zero constant literal converted to any pointer type decays into the null pointer constant, which does not point to any contiguous area of memory, the behavior is undefined.

However , performing arithmetic operations on null pointers in order to retrieve offsets is not new, the C implementation of the offsetof macro uses it:

#define offsetof(st, m) ((size_t)&(((st *)0)->m))

And doing the same arithmetic fashion on pointers is also frequent:

int *end = (int *)0 + array_size;

This line is virtually the same as writing:

int *end = (int *)(sizeof(int) * array_size);

I believe the offset calculation is implementation defined, as the compiler “could” dereference such pointers in order to retrieve the actual memory offset, which is of course very improbable, but still possible.

Also, note that this warning for null pointer arithmetic is specific to Clang 6.0. GCC does not trigger it even with -fsanitize=undefined .

When the C Standard was written, the vast majority of C implementations would, for any non- void* pointer value p, uphold the invariants that p+0 and p-0 both yield p , and pp will yield zero. More generally, operations like a size-zero memcpy or fwrite that operate on a buffer of size N would ignore the buffer address when N was zero. Such behavior would allow programmers to avoid having to write code to handle corner cases. For example, code to output a packet with an optional payload passed via address and length arguments would naturally process (NULL,0) as an empty payload.

Nothing in the published Rationale for the C Standard suggests that implementations whose target platforms would naturally behave in such fashion shouldn't continue to work as they always had. There were, however, a few platforms where it may have been expensive to uphold such behavioral guarantees in cases where p is null.

As with most situations where the vast majority of C implementations would process a construct identically, but implementations might exist where such treatment would be impractical, the Standard characterizes the addition of zero to a null pointer as Undefined Behavior. The Standard allows implementations to, as a form of "conforming language extension", define the behavior of constructs in cases where it imposes no requirements, and it allow conforming (but not strictly conforming) programs to make use of them. According to the published Rationale, the stated intention was that support for such "popular extensions" be regarded as a "quality of implementation" issue to be decided by the marketplace. Implementations that could support them at essentially zero cost would do so, but implementations where such support would be expensive would be free to support such constructs or not based upon their customers' needs.

If one is using a compiler that targets commonplace platforms, and is designed to process the widest range of useful programs reasonably efficiently, then the extended semantics surrounding pointer arithmetic may allow one to write code more efficiently than would otherwise be possible. If one is targeting a compiler that does not value compatibility with quality compilers, however, one should recognize that it may treat the Standard's allowance for quirky hardware as an invitation to behave nonsensically even on commonplace hardware. Of course, one should also be aware that such compilers may behave nonsensically in corner cases where adherence with the Standard would require them to forego optimizations that are unsound but would "usually" be safe.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM