简体   繁体   中英

Compiler optimization creating a syscall?

I am compiling a fairly sophisticated application in two modes: Debug and Release. The main difference, as I see it, is -O0 vs -O3 (I can provide the relevant part of makefile if needed). I am trying to avoid syscall generation as much as possible, as I am simulating this application in syscall emulation mode (no OS running underneath). The problem that i am currently having is that in Release mode the compiler generates an extra socket syscall, which I prefer not to happen (and which does not happen in Debug mode).

The reason that I think the socket might be created is that I am using pthreads and two of my threads are communicating through a volatile char*. So I'm guessing maybe the compiler is trying to implement it in a fancy way when I set the -O3 flag? But I'm not sure if that is a reasonable assumption.

  1. Is it possible that the socket syscall is being generated because of the -O3 flag? (doesn't make too much sense)
  2. If so, how can I hint to the compiler to avoid generating this syscall?

EDIT: BTW the code is in C and C++

EDIT: The code is statically linked against the following shared libraries:

*special pthreads library*

Also, I found where in the binary the call to socket is happening:

8c716:       db28            blt.n   8c76a <openlog_internal+0xf2>
8c718:       f8d9 1008       ldr.w   r1, [r9, #8]
8c71c:       4620            mov     r0, r4
8c71e:       2200            movs    r2, #0
8c720:       f441 2100       orr.w   r1, r1, #524288 ; 0x80000
8c724:       f001 e97c       blx     8da20 <__socket>
8c728:       4b20            ldr     r3, [pc, #128]  ; (8c7ac <openlog_internal+0x134>)
8c72a:       681b            ldr     r3, [r3, #0]
8c72c:       f8c9 0004       str.w   r0, [r9, #4]
8c730:       b943            cbnz    r3, 8c744 <openlog_internal+0xcc>
8c732:       1c43            adds    r3, r0, #1

EDIT: I found out why this is happening (see my answer below). If anyone has an explanation as to why the compiler behaves like that please share!!!

Although, one can imagine such an optimization, I haven't heard of such and I really doubt it, because any system call is usually very expensive.

If you are on a *nix system, you can verify it by looking for undefined symbols with nm

nm -u file1.o file2.o | grep socket

should show somewhere the missing socket symbol as

        U socket

if there is somewhere a call to socket.

As I mentioned, I doubt, that there is an optimization inserting any system call and I expect no output from the command line above.


On my system (Ubuntu 12.04, gcc 4.6), I found the following note in man gcc

-O2 Optimize even more. ...
NOTE: In Ubuntu 8.10 and later versions, -D_FORTIFY_SOURCE=2 is set by default, and is activated when -O is set to 2 or higher. This enables additional compile- time and run-time checks for several libc functions. To disable, specify either -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.

So, maybe through this or a similar mechanism, there is some code included when the optimization is set to -O2 or -O3 .

After a whole day of debugging, it turns out that arm-cross-gcc compiles strcpy() differently under -O0 and -O1,-O2,-O3 when the string you are copying from is a volatile char* . -O0 compiles using standard user mode assembly, whereas -O1,-O2,-O3 compile it with extra syscalls, such as socket, connect, and send.

So, after all, my initial hunch is justified:

"The reason that I think the socket might be created is that I am using pthreads and two of my threads are communicating through a volatile char*. So I'm guessing maybe the compiler is trying to implement it in a fancy way when I set the -O3 flag? But I'm not sure if that is a reasonable assumption."

EDIT: Here are some observations to support this claim:

I compiled my code in 4 versions

 1. without strcpy() -O0 => obj.o0
 2. with    strcpy() -O0 => obj_strcpy.o0
 3. without strcpy() -O3 => obj.o3
 4. with    strcpy() -O3 => obj_strcpy.o3

I ran nm -u on all of the above.

Here are the diffs:

$> diff obj.o0 obj_strcpy.o0
$> diff obj.o3 obj_strcpy.o3
>          U __strcpy_chk

This means that when you add strcpy() to your code and compile with -O0 no external symbols are added, whereas when you add strcpy() to your code and compile with -O3 U __strcpy_chk symbol is added to the object file. I will look into the implementation of U __strcpy_chk on ARM to figure out where the syscalls are coming from. As of right now it seems like U __strcpy_chk is doing buffer overflow checks - here is the reference:


EDIT: So there are two solutions so far: one proposed by Olaf Dietsche to use another compiler flag in addition to -O3. The other option is to avoid strcpy() altogether and use something as follows:

for(int i=0;i<64;i++)
  cmd[i] = msg[i];

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM