简体   繁体   中英

Is dereferencing a pointer that's equal to nullptr undefined behavior by the standard?

An blog author has brought up the discussion about null pointer dereferecing:

I've put some counter arguments here:

His main line of reasoning quoting the standard is this:

The '&podhd->line6' expression is undefined behavior in the C language when 'podhd' is a null pointer.

The C99 standard says the following about the '&' address-of operator (6.5.3.2 "Address and indirection operators"):

The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.

The expression 'podhd->line6' is clearly not a function designator, the result of a [] or * operator. It is an lvalue expression. However, when the 'podhd' pointer is NULL, the expression does not designate an object since 6.3.2.3 "Pointers" says:

If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

When "an lvalue does not designate an object when it is evaluated, the behavior is undefined" (C99 6.3.2.1 "Lvalues, arrays, and function designators"):

An lvalue is an expression with an object type or an incomplete type other than void; if an lvalue does not designate an object when it is evaluated, the behavior is undefined.

So, the same idea in brief:

When -> was executed on the pointer, it evaluated to an lvalue where no object exists, and as a result the behavior is undefined.

This question is purely language based, I'm not asking regarding whether a given system allows one to tamper with what lies at address 0 in any language.

As far as I can see, there's no restriction in dereferencing a pointer variable whose value is equal to nullptr , even thought comparisons of a pointer against the nullptr (or (void *) 0 ) constant can vanish in optimizations in certain situations because of the stated paragraphs, but this looks like another issue, it doesn't prevent dereferencing a pointer whose value is equal to nullptr . Notice that I've checked other SO questions and answers, I particularly like this set of quotations , as well as the standard quotes above, and I didn't stumbled upon something that clearly infers from standard that if a pointer ptr compares equal to nullptr , dereferencing it would be undefined behavior.

At most what I get is that deferencing the constant (or its cast to any pointer type) is what is UB, but nothing saying about a variable that's bit equal to the value that comes up from nullptr .

I'd like to clearly separate the nullptr constant from a pointer variable that holds a value equals to it. But an answer that address both cases is ideal.

I do realise that optimizations can quick in when there're comparisons against nullptr , etc and may simply strip code based on that.

If the conclusion is that, if ptr equals to the value of nullptr dereferencing it is definitely UB, another question follows:

Do C and C++ standards imply that a special value in the address space must exist solely to represent the value of null pointers?

As you quote C, dereferencing a null pointer is clearly undefined behavior from this Standard quote (emphasis mine):

(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined .102)"

102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer , an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."

Exact same quote in C99 and similar in C89 / C90.

C++

dcl.ref/5.

There shall be no references to references, no arrays of references, and no pointers to references. The declaration of a reference shall contain an initializer (8.5.3) except when the declaration contains an explicit extern specifier (7.1.1), is a class member (9.2) declaration within a class definition, or is the declaration of a parameter or a return type (8.3.5); see 3.1. A reference shall be initialized to refer to a valid object or function. [ Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by indirection through a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. — end note ]

The note is of interest, as it explicitly says dereferencing a null pointer is undefined.

I'm sure it says it somewhere else in a more relevant context, but this is good enough.

The answer to this that I see, as to what degree a NULL value may be dereferenced, is it is deliberately left platform-dependent in an unspecified manner, due to what is left implementation-defined in C11 6.3.2.3p5 and p6. This is mostly to support freestanding implementations used for developing boot code for a platform, as OP indicates in his rebuttal link, but has applications for a hosted implementation too.

Re:
(C11, 6.5.3.2p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.102)"

102): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."

This is phrased as it is, afaict, because each of the cases in the footnote may NOT be invalid for specific platforms a compiler is targeting. If there's a defect there, it's "invalid value" should be italicized and qualified by "implementation-defined". For the alignment case a platform may be able to access any type using any address so has no alignment requirements, especially if address rollover is supported; and a platform may assume an object's lifetime only ends after the application has exited, allocating a new frame via malloc() for automatic variables on each function call.

For null pointers, at boot time a platform may have expectations that structures the processor uses have specific physical addresses, including at address 0, and get represented as object pointers in source code, or may require the function defining the boot process to use a base address of 0. If the standard didn't permit dereferences like '&podhd->line6', where a platform required podhd to have a base address of 0, then assembly language would be needed to access that structure. Similarly, a soft reboot function might need to dereference a 0 valued pointer as a void function invocation. A hosted implementation may consider 0 the base of an executable image, and map a NULL pointer in source code to the header of that image, after loading, as the struct required to be at logical address 0 for that instance of the C virtual machine.

What the standard calls pointers are more handles into the virtual address space of the virtual machine, where object handles have more requirements on what operations are permitted for them. How the compiler emits code that takes the requirements of these handles into account for a specific processor is left undefined. What is efficient for one processor may not be for another, after all.

The requirement on (void *)0 is more that the compiler emit code that guarantees expressions where the source uses (void *)0, explicitly or by referencing NULL, that the actual value stored will be one that says this can't point to any valid function definitions or objects by any mapping code. This does not have to be a 0! Similarly, for casts of (void *)0 to (obj_type) and (func_type), these are only required to get assigned values that evaluate as addresses the compiler guarantees are not being used then for objects or code. The difference with the latter is these are unused, not invalid, so are capable of being dereferenced in the defined manner.

The code that tests for pointer equality would then check if one operand is one of these values that the other is one of the 3, not just the same bit pattern, because this scoreboards them with the RTTI of being a (null *) type, distinct from void, obj, and func pointer types to defined entities. The standard could be more explicit it is a distinct type, if unnamed because compilers only use it internally, but I suppose this is considered obvious by "null pointer" being italicized. Effectively, imo, a '0' in these contexts is an additional keyword token of the compiler, due to the additional requirement of it identifying the (null *) type, but isn't characterized as such because this would complicate the definition of < identifiers >.

This stored value can be SIZE_MAX as easily as a 0, for a (void *)0, in emitted application code when implementations, for example, define the range 0 to SIZE_MAX-4*sizeof(void *) of virtual machine handles as what is valid for code and data. The NULL macro may even be defined as
(void *)SIZE_MAX, and it would be up to the compiler to figure out from context this has the same semantics as 0. The casting code is responsible for noting it is the chosen value, in pointer <--> pointer casts, and supply what is appropriate as an object or function pointer. Casts from pointer <--> integer, implicit or explicit, have similar check and supply requirements; especially in unions where a (u)intptr_t field overlays a ( type *) field. Portable code can guard against compilers not doing this properly with an explicit *(ptr==NULL?( type *)0:ptr) expression.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM