Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++

Question

Suppose we have:

char* p;
int   x;

As recently discussed in another question , arithmetic including comparison operations on invalid pointers can generate unexpected behavior in gcc linux x86-64 C++. This new question is specifically about the expression (p+x)-x : can it generate unexpected behavior (ie, result not being p ) in any existing GCC version running on x86-64 linux?

Note that this question is just about pointer arithmetic; there is absolutely no intention to access the location designated by *(p+x) , which obviously would be unpredictable in general.

The practical interest here is non-zero-based arrays . Note that (p+x) and the subtraction by x happen in different places in the code in these applications.

If recent GCC versions on x86-64 can be shown to never generate unexpected behavior for (p+x)-x then these versions can be certified for non-zero-based arrays, and future versions generating unexpected behavior could be modified or configured to support this certification.

UPDATE

For the practical case described above, we could also assume p itself is a valid pointer and p != NULL .

Answer 1

Here's a list of gcc extensions. https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html

There is an extension for pointer arithmetic. Gcc allows performing pointer arithmetic on void pointers. (Not the extension you're asking about.)

So, gcc treats the behavior for the pointer arithmetic you're asking about as undefined under the same conditions as described in the language standard.

You can look through there and see if there is anything I missed that's relevant to your question.

Answer 2

You do not understand what "undefined behavior" is, and I cannot blame you, given that it is often poorly explained. This is how the standard defines undefined behavior, section 3.27 in intro.defs:

behavior for which this document imposes no requirements

That's it. Nothing less, nothing more. The standard can be thought as a series of constraints for compiler vendors to follow when generating valid programs. When there's undefined behavior, all bets are off.

Some people say that undefined behavior can lead to your program spawning dragons or reformatting your hard drive, but I find that to be a bit of a strawman. More realistically, something like going past the ends of the bounds of an array can result in a seg fault (due to triggering a page fault). Sometimes undefined behavior allows compilers to make optimizations that can change the behavior of your program in unexpected ways, since there's nothing saying the compiler can't .

The point is that compilers not "generate undefined behavior". Undefined behavior exists in your program.

What I meant is, if GCC has a great feature (specifically, math on invalid pointers) that's not currently named, we can give it a name, and then demand it in future versions too.

Then it would be a non-standard extension and one would expect it to be documented. I also highly doubt that such a feature would be in high demand given that it would not only allow people to write unsafe code, but it would be extremely hard to generate portable programs for.

Answer 3

Yes, for gcc5.x and later specifically, that specific expression is optimized very early to just p , even with optimization disabled, regardless of any possible runtime UB.

This happens even with a static array and compile-time constant size. gcc -fsanitize=undefined doesn't insert any instrumentation to look for it either. Also no warnings at -Wall -Wextra -Wpedantic

int *add(int *p, long long x) {
    return (p+x) - x;
}

int *visible_UB(void) {
    static int arr[100];
    return (arr+200) - 200;
}

Using gcc -dump-tree-original to dump its internal representation of program logic before any optimization passes shows that this optimization happened even before that in gcc5.x and newer . (And happens even at -O0 ).

;; Function int* add(int*, long long int) (null)
;; enabled by -tree-original

return <retval> = p;


;; Function int* visible_UB() (null)
;; enabled by -tree-original
{
  static int arr[100];

    static int arr[100];
  return <retval> = (int *) &arr;
}

That's from the Godbolt compiler explorer with gcc8.3 with -O0 .

The x86-64 asm output is just:

; g++8.3 -O0 
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi    # spill args
    mov     rax, QWORD PTR [rsp-8]     # reload only the pointer
    ret
visible_UB():
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

-O3 output is of course just mov rax, rdi

gcc4.9 and earlier only do this optimization in a later pass, and not at -O0 : the tree dump still includes the subtract, and the x86-64 asm is

# g++4.9.4 -O0
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi
    mov     rax, QWORD PTR [rsp-16]
    lea     rdx, [0+rax*4]            # RDX = x*4 = x*sizeof(int)
    mov     rax, QWORD PTR [rsp-16]
    sal     rax, 2
    neg     rax                       # RAX = -(x*4)
    add     rdx, rax                  # RDX = x*4 + (-(x*4)) = 0
    mov     rax, QWORD PTR [rsp-8]
    add     rax, rdx                  # p += x + (-x)
    ret

visible_UB():       # but constants still optimize away at -O0
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

This does line up with the -fdump-tree-original output:

return <retval> = p + ((sizetype) ((long unsigned int) x * 4) + -(sizetype) ((long unsigned int) x * 4));

If x*4 overflows, you'll still get the right answer. In practice I can't think of a way to write a function that would lead to the UB causing an observable change in behaviour.

As part of a larger function, a compiler would be allowed to infer some range info, like that p[x] is part of the same object as p[0] , so reading memory in between / out that far is allowed and won't segfault. eg allowing auto-vectorization of a search loop.

But I doubt that gcc even looks for that, let alone takes advantage of it.

(Note that your question title was specific to gcc targeting x86-64 on Linux, not about whether similar things are safe in gcc, eg if done in separate statements. I mean yes probably safe in practice, but won't be optimized away almost immediately after parsing. And definitely not about C++ in general.)

I highly recommend not doing this. Use uintptr_t to hold pointer-like values that aren't actual valid pointers. like you're doing in the updates to your answer on C++ gcc extension for non-zero-based array pointer allocation? .

Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++

Question

3 answers

solution1
1 2019-03-03 11:04:23

solution2
1 2019-03-03 11:07:06

solution3
1 ACCPTED 2019-03-04 01:58:19

Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++

Question

3 answers

solution1 1 2019-03-03 11:04:23

solution2 1 2019-03-03 11:07:06

solution3 1 ACCPTED 2019-03-04 01:58:19

solution1
1 2019-03-03 11:04:23

solution2
1 2019-03-03 11:07:06

solution3
1 ACCPTED 2019-03-04 01:58:19