简体   繁体   English

Linux 系统调用何时触发段错误与返回 EFAULT?

[英]When do Linux system calls trigger a segfault vs returning EFAULT?

I'm trying to understand when clock_gettime() can lead to errors.我试图了解clock_gettime()何时会导致错误。 The man page lists the following two possibilities:手册页列出了以下两种可能性:

  1. EFAULT tp points outside the accessible address space. EFAULT tp 指向可访问地址空间之外。
  2. EINVAL The clk_id specified is not supported on this system. EINVAL 此系统不支持指定的 clk_id。

It's easy to trigger an EINVAL error but I'm not able to get clock_gettime() to set errno to EFAULT .很容易触发EINVAL错误,但我无法让clock_gettime()errnoEFAULT Instead, the kernel sends a SIGSEGV signal to terminate the program.相反,内核会发送一个 SIGSEGV 信号来终止程序。 For instance, in the following code:例如,在以下代码中:

#include <time.h>
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>

int main()
{
    struct timespec tp;
    double time;

    if (clock_gettime(CLOCK_MONOTONIC, &tp + 4096) == -1) {
        if (errno == EINVAL) {
            perror("EINVAL");
            return EXIT_FAILURE;
        } else if (errno == EFAULT) {
            perror("EFAULT");
            return EXIT_FAILURE;
        } else {
            perror("something else");
            return EXIT_FAILURE;
        }
    }

    time = tp.tv_sec + 1e-9 * tp.tv_nsec;
    printf("%f\n", time);
}

How does the Linux kernel choose between triggering a segmentation fault and having the system call return -EINVAL ? Linux 内核如何在触发分段错误和让系统调用返回-EINVAL When will it choose to do the latter?它什么时候会选择做后者? If the kernel always sends the signal, is it actually necessary to check whether errno equals EFAULT ?如果内核总是发送信号,是否真的需要检查errno是否等于EFAULT

I'm running Linux kernel 4.15 and I compiled the program with (using clang v6.0): clang -g -O0 -Wall -Wextra -Wshadow -Wstrict-aliasing -ansi -pedantic -Werror -std=gnu11 file.c -o file我正在运行 Linux 内核 4.15,并使用(使用 clang v6.0)编译程序: clang -g -O0 -Wall -Wextra -Wshadow -Wstrict-aliasing -ansi -pedantic -Werror -std=gnu11 file.c -o file

clock_gettime is probably not executing as a syscall, but rather in userspace as part of the vdso . clock_gettime可能不是作为系统调用执行,而是作为vdso 的一部分在用户空间中执行 If you actually perform a syscall by using the syscall function with SYS_clock_gettime as its argument, I would expect you to see EFAULT .如果您通过使用SYS_clock_gettime作为参数的syscall函数实际执行系统syscall ,我希望您看到EFAULT

With that said, EFAULT is not ever something you should expect to be able to rely on.话虽如此, EFAULT并不是您应该期望能够依赖的东西。 As soon as you pass an invalid pointer to a function that requires a valid pointer as part of its interface contract, you have undefined behavior, and a segfault or an error is only one possible manifestation among many.一旦您将无效指针传递给需要有效指针作为其接口合同一部分的函数,您就会有未定义的行为,并且段错误或错误只是众多表现中的一种 From this perspective it's something of a mistake that EFAULT is even documented.从这个角度来看, EFAULT甚至被记录下来是一个错误。

I'm trying to understand when clock_gettime() can lead to errors.我试图了解 clock_gettime() 何时会导致错误。

Ok.好的。

How does the Linux kernel choose between triggering a segmentation fault and having the system call return -EINVAL? Linux 内核如何在触发分段错误和让系统调用返回 -EINVAL 之间做出选择? When will it choose to do the latter?它什么时候会选择做后者?

It's easy.这很简单。 There are some checks in case they are true the function sets errno.有一些检查,以防它们为真,函数会设置 errno。 In case you access a protected memory region the kernel sends SIGSEGV to your process.如果您访问受保护的内存区域,内核会向您的进程发送 SIGSEGV。

If you inspect the __clock_gettime from glibc function you see that:如果您检查glibc函数中的__clock_gettime,您会看到:

switch (clock_id)
    {
#ifdef SYSDEP_GETTIME
      SYSDEP_GETTIME;
#endif

#ifndef HANDLED_REALTIME
    case CLOCK_REALTIME:
      ...
      break;
#endif

    default:
#if HP_TIMING_AVAIL
      if ((clock_id ...) == CLOCK_THREAD_CPUTIME_ID)
           ...
      else
#endif
            __set_errno (EINVAL);
      break;

The glibc wrapper set's EINVAL in case of some strange clock_id value. glibc 包装器设置的 EINVAL 以防出现一些奇怪的 clock_id 值。

Dereferencing a pointer value outside any valid memory region in undefined behaviour and spawns nasal demons .在未定义的行为中取消引用任何有效内存区域之外的指针值并产生鼻恶魔 On Linux a SIGSEGV is a signal sent to a process which tries to write to a protected memory region.在 Linux 上, SIGSEGV是发送到尝试写入受保护内存区域的进程的信号。

The following code spawns demons and should raise SIGSEGV:以下代码会产生恶魔并应该引发 SIGSEGV:

struct timespec tp;
*(&tp + 4096) = (struct timespec){0};

so does the following code:以下代码也是如此:

struct timespec tp;
clock_gettime(CLOCK_MONOTONIC, &tp + 4096)

If the kernel always sends the signal,如果内核总是发送信号,

Not really.并不真地。 If it so just happens that sizeof(struct timespec) bytes starting from &tp + 4096 will not be inside protected memory region, the kernel will not send any signal, cause it would think, you write inside you own memory.如果碰巧从&tp + 4096开始的sizeof(struct timespec)字节将不在受保护的内存区域内,内核将不会发送任何信号,因为它会认为,您在自己的内存中写入。

is it actually necessary to check whether errno equals EFAULT?实际上是否有必要检查 errno 是否等于 EFAULT?

It's not necessary to check for any errors.没有必要检查任何错误。 I think you mix interpreting errors with checking for them.我认为您将解释错误与检查错误混合在一起。 If you machine follows the specification you mentioned, if clock_gettime returns EFAULT you can write your program so it assumes that the underlying implementation on your machine of clock_gettime follows the linux manual page of clock_gettime .如果你的机器遵循规范,你提到的,如果clock_gettime返回EFAULT所以假设你的机器上的底层实现,你可以写你的程序clock_gettime遵循的Linux手册页clock_gettime However, as you discovered, it does not, instead undefined behaviour happens and the kernel raises SIGSEGV.但是,正如您发现的那样,它不会发生,而是会发生未定义的行为,并且内核会引发 SIGSEGV。 Which only means that the underlying implementation of the clock_gettime function does not follow the manual.这只意味着clock_gettime函数的底层实现不遵循手册。 The POSIX does not specify the EFAULT errno code. POSIX没有指定 EFAULT errno 代码。 However I believe there may exists implementations which may return EFAULT errno or any other errno codes.但是我相信可能存在可能返回 EFAULT errno 或任何其他 errno 代码的实现。 However, what do you want your program to do when receiving EFAULT error?但是,当收到 EFAULT 错误时,您希望您的程序做什么? How to recover from such error?如何从此类错误中恢复? If these question bear any significance to you, then it may be reasonable to write an EFAULT handler for the clock_gettime function.如果这些问题对您有任何意义,那么为clock_gettime函数编写 EFAULT 处理程序可能是合理的。

Please note, you are using Linux.请注意,您使用的是 Linux。 Linux, kernel and glibc, mostly are licensed under the GNU General License or GNU Lesser General License, which has the following in it: Linux、kernel 和 glibc 大多是在 GNU 通用许可证或 GNU 宽松通用许可证下获得许可的,其中包含以下内容:

BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.由于图书馆是免费许可的,因此在适用法律允许的范围内,图书馆不提供任何保证。 EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.除非书面声明版权所有者和/或其他方“按原样”提供图书馆,但不提供任何形式的明示或暗示的保证,包括但不限于对 RP 的默示保证和商业性保证. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU.图书馆质量和性能的全部风险由您承担。 SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.如果图书馆证明有缺陷,您将承担所有必要的服务、维修或纠正费用。

The question bears down to trust: do you believe your system's clock_gettime() to follow the Linux manual implementation?问题归结为信任:您相信您的系统的clock_gettime()遵循Linux 手动实现吗? I don't.我不。 If your system would be POSIX certificate, you could place some more trust in the functions that they will work as the manual says.如果您的系统是 POSIX 证书,您可以更加信任它们将按照手册所述工作的功能。 No one guarantees you that, it is just a good will of many hardworking people that it works.没有人向您保证,这只是许多努力工作的人的良好意愿。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM