简体   繁体   中英

C Standard Library Functions vs. System Calls. Which is `open()`?

I know fopen() is in the C standard library, so that I can definitely call the fopen() function in a C program. What I am confused about is why I can call the open() function as well. open() should be a system call, so it is not a C function in the standard library. As I am successfully able to call the open() function, am I calling a C function or a system call?

EJP's comments to the question and Steve Summit's answer are exactly to the point: open() is both a syscall and a function in the standard C library; fopen() is a function in the standard C library, that sets up a file handle -- a data structure of type FILE that contains additional stuff like optional buffering --, and internally calls open() also.


In the hopes to further understanding, I shall show hello.c , an example Hello world -program written in C for Linux on 64-bit x86 (x86-64 AKA AMD64 architecture), which does not use the standard C library at all.

First, hello.c needs to define some macros with inline assembly for us to be able to call the syscalls. These are very architecture- and operating system dependent, which is why this only works in Linux on x86-64 architecture:

/* Freestanding Hello World example in Linux on x86_64/x86.
 * Compile using
 *      gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello
*/
#define STDOUT_FILENO 1
#define EXIT_SUCCESS  0

#ifndef __x86_64__
#error  This program only works on x86_64 architecture!
#endif

#define SYS_write    1
#define SYS_exit    60

#define SYSCALL1_NORET(nr, arg1) \
    __asm__ ( "syscall\n\t" \
            : \
            : "a" (nr), "D" (arg1) \
            : "rcx", "r11" )

#define SYSCALL3(retval, nr, arg1, arg2, arg3) \
    __asm__ ( "syscall\n\t" \
            : "=a" (retval) \
            : "a" (nr), "D" (arg1), "S" (arg2), "d" (arg3) \
            : "rcx", "r11" )

The Freestanding in the comment at the beginning of the file refers to "freestanding execution environment" ; it is the case when there is no C library available at all. For example, the Linux kernel is written the same way. The normal environment we are familiar with is called "hosted execution environment" , by the way.

Next, we can define two functions, or "wrappers", around the syscalls:

static inline void my_exit(int retval)
{
    SYSCALL1_NORET(SYS_exit, retval);
}

static inline int my_write(int fd, const void *data, int len)
{
    int retval;

    if (fd == -1 || !data || len < 0)
        return -1;

    SYSCALL3(retval, SYS_write, fd, data, len);

    if (retval < 0)
        return -1;

    return retval;
}

Above, my_exit() is roughly equivalent to C standard library exit() function, and my_write() to write() .

The C language does not define any kind of a way to do a syscall, so that is why we always need a "wrapper" function of some sort. (The GNU C library does provide a syscall() function for us to do any syscall we wish -- but the point of this example is to not use the C library at all.)

The wrapper functions always involve a bit of (inline) assembly. Again, since C does not have a built-in way to do a syscall, we need to "extend" the language by adding some assembly code. This (inline) assembly, and the syscall numbers, is what makes this example, operating system and architecture dependent. And yes: the GNU C library, for example, contains the equivalent wrappers for quite a few architectures .

Some of the functions in the C library do not use any syscalls. We also need one, the equivalent of strlen() :

static inline int my_strlen(const char *str)
{
    int len = 0L;

    if (!str)
        return -1;

    while (*str++)
        len++;

    return len;
}

Note that there is no NULL used anywhere in the above code. It is because it is a macro defined by the C library. Instead, I'm relying on "logical null": (!pointer) is true if and only if pointer is a zero pointer, which is what NULL is on all architectures in Linux. I could have defined NULL myself, but I didn't, in the hopes that somebody might notice the lack of it.

Finally, main() itself is something the GNU C library calls, as in Linux, the actual start point of the binary is called _start . The _start is provided by the hosted runtime environment, and initializes the C library data structures and does other similar preparations. Our example program is so simple we do not need it, so we can just put our simple main program part into _start instead:

void _start(void)
{
    const char *msg = "Hello, world!\n";
    my_write(STDOUT_FILENO, msg, my_strlen(msg));
    my_exit(EXIT_SUCCESS);
}

If you put all of the above together, and compile it using

gcc -march=x86-64 -mtune=generic -m64 -ffreestanding -nostdlib -nostartfiles hello.c -o hello

per the comment at the start of the file, you will end up with a small (about two kilobytes) static binary, that when run,

./hello

outputs

Hello, world!

You can use file hello to examine the contents of the file. You could run strip hello to remove all (unneeded) symbols, reducing the file size further down to about one and a half kilobytes, if file size was really important. (It will make the object dump less interesting, however, so before you do that, check out the next step first.)

We can use objdump -x hello to examine the sections in the file:

hello:     file format elf64-x86-64
hello
architecture: i386:x86-64, flags 0x00000112:
EXEC_P, HAS_SYMS, D_PAGED
start address 0x00000000004001e1

Program Header:
    LOAD off    0x0000000000000000 vaddr 0x0000000000400000 paddr 0x0000000000400000 align 2**21
         filesz 0x00000000000002f0 memsz 0x00000000000002f0 flags r-x
    NOTE off    0x0000000000000120 vaddr 0x0000000000400120 paddr 0x0000000000400120 align 2**2
         filesz 0x0000000000000024 memsz 0x0000000000000024 flags r--
EH_FRAME off    0x000000000000022c vaddr 0x000000000040022c paddr 0x000000000040022c align 2**2
         filesz 0x000000000000002c memsz 0x000000000000002c flags r--
   STACK off    0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
         filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .note.gnu.build-id 00000024  0000000000400120  0000000000400120  00000120  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .text         000000d9  0000000000400144  0000000000400144  00000144  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .rodata       0000000f  000000000040021d  000000000040021d  0000021d  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .eh_frame_hdr 0000002c  000000000040022c  000000000040022c  0000022c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .eh_frame     00000098  0000000000400258  0000000000400258  00000258  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .comment      00000034  0000000000000000  0000000000000000  000002f0  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
0000000000400120 l    d  .note.gnu.build-id     0000000000000000 .note.gnu.build-id
0000000000400144 l    d  .text  0000000000000000 .text
000000000040021d l    d  .rodata        0000000000000000 .rodata
000000000040022c l    d  .eh_frame_hdr  0000000000000000 .eh_frame_hdr
0000000000400258 l    d  .eh_frame      0000000000000000 .eh_frame
0000000000000000 l    d  .comment       0000000000000000 .comment
0000000000000000 l    df *ABS*  0000000000000000 hello.c
0000000000400144 l     F .text  0000000000000016 my_exit
000000000040015a l     F .text  000000000000004e my_write
00000000004001a8 l     F .text  0000000000000039 my_strlen
0000000000000000 l    df *ABS*  0000000000000000 
000000000040022c l       .eh_frame_hdr  0000000000000000 __GNU_EH_FRAME_HDR
00000000004001e1 g     F .text  000000000000003c _start
0000000000601000 g       .eh_frame      0000000000000000 __bss_start
0000000000601000 g       .eh_frame      0000000000000000 _edata
0000000000601000 g       .eh_frame      0000000000000000 _end

The .text section contains our code, and .rodata immutable constants; here, just the Hello, world! string literal. The rest of the sections are stuff the linker adds and the system uses. We can see that we have f (hex) = 15 bytes of read-only data, and d9 (hex) = 217 bytes of code; the rest of the file (about a kilobyte or so) is ELF stuff added by the linker for the kernel to use when executing this binary.

We can even examine the actual assembly code contained in hello , by running objdump -d hello :

hello:     file format elf64-x86-64


Disassembly of section .text:

0000000000400144 <my_exit>:
  400144:       55                      push   %rbp
  400145:       48 89 e5                mov    %rsp,%rbp
  400148:       89 7d fc                mov    %edi,-0x4(%rbp)
  40014b:       b8 3c 00 00 00          mov    $0x3c,%eax
  400150:       8b 55 fc                mov    -0x4(%rbp),%edx
  400153:       89 d7                   mov    %edx,%edi
  400155:       0f 05                   syscall 
  400157:       90                      nop
  400158:       5d                      pop    %rbp
  400159:       c3                      retq   

000000000040015a <my_write>:
  40015a:       55                      push   %rbp
  40015b:       48 89 e5                mov    %rsp,%rbp
  40015e:       89 7d ec                mov    %edi,-0x14(%rbp)
  400161:       48 89 75 e0             mov    %rsi,-0x20(%rbp)
  400165:       89 55 e8                mov    %edx,-0x18(%rbp)
  400168:       83 7d ec ff             cmpl   $0xffffffff,-0x14(%rbp)
  40016c:       74 0d                   je     40017b <my_write+0x21>
  40016e:       48 83 7d e0 00          cmpq   $0x0,-0x20(%rbp)
  400173:       74 06                   je     40017b <my_write+0x21>
  400175:       83 7d e8 00             cmpl   $0x0,-0x18(%rbp)
  400179:       79 07                   jns    400182 <my_write+0x28>
  40017b:       b8 ff ff ff ff          mov    $0xffffffff,%eax
  400180:       eb 24                   jmp    4001a6 <my_write+0x4c>
  400182:       b8 01 00 00 00          mov    $0x1,%eax
  400187:       8b 7d ec                mov    -0x14(%rbp),%edi
  40018a:       48 8b 75 e0             mov    -0x20(%rbp),%rsi
  40018e:       8b 55 e8                mov    -0x18(%rbp),%edx
  400191:       0f 05                   syscall 
  400193:       89 45 fc                mov    %eax,-0x4(%rbp)
  400196:       83 7d fc 00             cmpl   $0x0,-0x4(%rbp)
  40019a:       79 07                   jns    4001a3 <my_write+0x49>
  40019c:       b8 ff ff ff ff          mov    $0xffffffff,%eax
  4001a1:       eb 03                   jmp    4001a6 <my_write+0x4c>
  4001a3:       8b 45 fc                mov    -0x4(%rbp),%eax
  4001a6:       5d                      pop    %rbp
  4001a7:       c3                      retq   

00000000004001a8 <my_strlen>:
  4001a8:       55                      push   %rbp
  4001a9:       48 89 e5                mov    %rsp,%rbp
  4001ac:       48 89 7d e8             mov    %rdi,-0x18(%rbp)
  4001b0:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  4001b7:       48 83 7d e8 00          cmpq   $0x0,-0x18(%rbp)
  4001bc:       75 0b                   jne    4001c9 <my_strlen+0x21>
  4001be:       b8 ff ff ff ff          mov    $0xffffffff,%eax
  4001c3:       eb 1a                   jmp    4001df <my_strlen+0x37>
  4001c5:       83 45 fc 01             addl   $0x1,-0x4(%rbp)
  4001c9:       48 8b 45 e8             mov    -0x18(%rbp),%rax
  4001cd:       48 8d 50 01             lea    0x1(%rax),%rdx
  4001d1:       48 89 55 e8             mov    %rdx,-0x18(%rbp)
  4001d5:       0f b6 00                movzbl (%rax),%eax
  4001d8:       84 c0                   test   %al,%al
  4001da:       75 e9                   jne    4001c5 <my_strlen+0x1d>
  4001dc:       8b 45 fc                mov    -0x4(%rbp),%eax
  4001df:       5d                      pop    %rbp
  4001e0:       c3                      retq   

00000000004001e1 <_start>:
  4001e1:       55                      push   %rbp
  4001e2:       48 89 e5                mov    %rsp,%rbp
  4001e5:       48 83 ec 10             sub    $0x10,%rsp
  4001e9:       48 c7 45 f8 1d 02 40    movq   $0x40021d,-0x8(%rbp)
  4001f0:       00 
  4001f1:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  4001f5:       48 89 c7                mov    %rax,%rdi
  4001f8:       e8 ab ff ff ff          callq  4001a8 <my_strlen>
  4001fd:       89 c2                   mov    %eax,%edx
  4001ff:       48 8b 45 f8             mov    -0x8(%rbp),%rax
  400203:       48 89 c6                mov    %rax,%rsi
  400206:       bf 01 00 00 00          mov    $0x1,%edi
  40020b:       e8 4a ff ff ff          callq  40015a <my_write>
  400210:       bf 00 00 00 00          mov    $0x0,%edi
  400215:       e8 2a ff ff ff          callq  400144 <my_exit>
  40021a:       90                      nop
  40021b:       c9                      leaveq 
  40021c:       c3                      retq  

The assembly itself is not really that interesting, except that in my_write and my_exit you can see how the inline assembly generated by the SYSCALL...() macro just loads the variables into specific registers, and does the "do syscall" -- which just happens to be an x86-64 assembly instruction also called syscall here; in 32-bit x86 architecture, it is int $80 , and yet something else in other architectures.

There is a final wrinkle, related to the reason why I used the prefix my_ for the functions analog to the functions in the C library: the C compiler can provide optimized shortcuts for some C library functions. For GCC, these are listed here ; the list includes strlen() .

This means we do not actually need the my_strlen() function, because we can use the optimized __builtin_strlen() function GCC provides, even in freestanding environment. The built-ins are usually very optimized; in the case of __builtin_strlen() on x86-64 using GCC-5.4.0, it optimizes to just a couple of register loads and a repnz scasb %es:(%rdi),%al instruction (which looks long, but actually takes just two bytes).

In other words, the final wrinkle is that there is a third type of function, compiler built-ins, that are provided by the compiler (but otherwise just like the functions provided by the C library) in optimized form, depending on the compiler options and architecture used.


If we were to expand the above example so that we'd open a file and write the Hello, world! into it, and compare low-level unistd.h ( open() / write() / close() ) and standard I/O stdio.h ( fopen() / puts() / fclose() ) approaches, we'd find that the major difference is in that the FILE handle used by the standard I/O approach contains a lot of extra stuff (that makes the standard file handles quite versatile, just not useful in such a trivial example), most visible in the buffering approach it has. On the assembly level, we'd still see the same syscalls -- open , write , close -- used.

Even though at first glance the ELF format (used for binaries in Linux) contains a lot of "unneeded stuff" (about a kilobyte for our example program above), it is actually a very powerful format. It, and the dynamic loader in Linux, provides a way to auto-load libraries when a program starts (using LD_PRELOAD environment variable), and to interpose functions in other libraries -- essentially, replace them with new ones, but with a way to still be able to call the original interposed version of the function. There are lots of useful tricks, fixes, experiments, and debugging methods these allow.

Although the distinction between "system call" and "library function" can be a useful one to keep in mind, there's the issue that you have to be able to call system calls somehow. In general, then, every system call is present in the C library -- as a thin little library function that does nothing but make the transfer to the system call (however that's implemented).

So, yes, you can call open() from C code if you want to. (And somewhere, perhaps in a file called fopen.c , the author of your C library probably called it too, within the implementation of fopen() .)

The starting point for answering your question is to ask another question: What is a system call?

Generally, one thinks of a system call as a procedure that executes at an elevated processor privilege level. Generally, this means switching from user mode to kernel mode (some systems use multiple modes).

The mechanism for and application to enter kernel mode depends upon the system (and one Intel there are multiple ways). The general sequence for invoking a system service is the process executes an instruction that triggers a change processor mode exception. The CPU responds to the exception by invoking the appropriate exception/interrupt handler then dispatches to the appropriate operating system service.

The problem for C programming is that invoking a system service requires executing a specific hardware instruction and setting hardware register values. Operating systems provide wrapper functions that that handle the packing of parameters into registers, triggering the exception, then unpacking the return values from registers.

The open() function usually be a wrapper for high level languages to invoke system services. If you think about, fopen() is generally a "wrapper" for open().

So what we normally think of as a system call is a function that does nothing other than invoke a system service.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM