简体   繁体   English

什么内存加载导致x86-64 linux上的总线错误?

[英]When does memory load cause bus error on x86-64 linux?

I used to think that x86-64 supports unaligned memory access and invalid memory access always causes segmentation fault (except, perhaps, SIMD instructions like movdqa or movaps ). 我以前认为x86-64支持未对齐的内存访问和无效的内存访问总是导致分段错误(可能除了像movdqamovaps类的SIMD指令)。 Nevertheless recently I observed bus error with normal mov instruction. 不过最近我用普通的mov指令观察到了总线错误。 Here is a reproducer: 这是一个复制者:

void test(void *a)
{
    asm("mov %0, %%rbp\n\t"
        "mov 0(%%rbp), %%rdx\n\t"
        : : "r"(a) : "rbp", "rdx");
}

int main()
{
    test((void *)0x706a2e3630332d69);
    return 0;
}

(must be compiled with frame pointer omission, eg gcc -O test.c && ./a.out ). (必须使用帧指针省略编译,例如gcc -O test.c && ./a.out )。

mov 0(%rbp), %rdx instruction and the address 0x706a2e3630332d69 were copied from a coredump of the buggy program. mov 0(%rbp), %rdx指令和地址0x706a2e3630332d69是从有缺陷的程序的coredump复制的。 Changing it to 0 causes segfault, but just aligning to 0x706a2e3630332d60 is still bus error (my guess is that it is related to the fact that address space is 48-bit on x86-64). 将其更改为0会导致0x706a2e3630332d60错误,但只是对齐到0x706a2e3630332d60仍然是总线错误(我的猜测是它与地址空间在x86-64上的48位相关)。

The question is: which addresses cause bus error (SIGBUS)? 问题是:哪些地址导致总线错误(SIGBUS)? Is it determined by architecture or configured by OS kernel (ie in page table, control registers or something similar)? 它是由体系结构确定还是由OS内核配置(即在页表,控制寄存器或类似的东西中)?

SIGBUS is in a sad state. SIGBUS处于悲伤状态。 There's no consensus between different operating systems what it should mean and when it is generated varies wildly between operating systems, cpu architectures, configuration and the phase of the moon. 不同的操作系统之间没有达成共识,它的意义和生成时间在操作系统,CPU架构,配置和月相之间变化很大。 Unless you work with a very specific configuration you should just treat it "just like SIGSEGV , but different". 除非您使用非常具体的配置,否则您应该“将其视为SIGSEGV ,但不同”。

I suspect that originally it was supposed to mean "you tried a memory access that could not possibly be successful no matter what the kernel does", so in other words the exact bit pattern you have in the address can never be a valid memory access. 我怀疑它本来应该意味着“你尝试过无论内核做什么都无法成功的内存访问”,所以换句话说,你在地址中的确切位模式永远不会是有效的内存访问。 Most commonly this would mean unaligned access on strict alignment architectures. 最常见的是,这意味着严格对齐架构上的未对齐访问。 Then some systems started using it for accesses to virtual address space that doesn't exist (like in your example, the address you have can't exist). 然后,一些系统开始使用它来访问不存在的虚拟地址空间(例如,在您的示例中,您所拥有的地址不可存在)。 Then by accident some systems made it also mean that userland tried to touch kernel memory (since at least technically it's virtual address space that doesn't exist from the point of view of userland). 然后偶然的一些系统使得它也意味着userland试图触摸内核内存(因为从技术上讲,它至少是从用户区的角度来看不存在的虚拟地址空间)。 Then it became just random. 然后它变得随机。

Other than that I've seen SIGBUS from: 除此之外,我见过SIGBUS:

  • access to non-existent physical address from mmap:ed hardware. 从mmap:ed硬件访问不存在的物理地址。
  • exec of non-exec mapping 执行非执行映射
  • access to perfectly valid mapping, but overcommitted memory couldn't be faulted in at this moment (I've seen SIGSEGV, SIGKILL and SIGBUS here, at least one operating system does this differently depending on which architecture you're on). 访问完全有效的映射,但此时过度使用的内存不能出错(我在这里看过SIGSEGV,SIGKILL和SIGBUS,至少有一个操作系统根据你所使用的架构有不同的做法)。
  • memory management deadlocks (and other "something went horribly wrong, but we don't know what" memory management errors). 内存管理死锁(和其他“一些可怕的错误,但我们不知道什么”内存管理错误)。
  • stack red zone access 堆栈红区访问
  • hardware errors (ECC memory, pci bus parity errors, etc.) 硬件错误(ECC内存,pci总线奇偶校验错误等)
  • access to mmap:ed file where the file contents don't exist (past the end of the file or a hole). 访问mmap:ed文件,其中文件内容不存在(超过文件末尾或空洞)。
  • access to mmap:ed file where the file contents should exist, but don't (I/O errors). 访问mmap:ed文件,其中文件内容应该存在,但不存在(I / O错误)。
  • access to normal memory that got swapped out and swap in couldn't be performed (I/O error). 无法执行换出和交换的正常内存访问(I / O错误)。

Generally, a SIGBUS can be sent on an unaligned memory access , ie when writing a 64-bit integer to an address, which is not 8-byte aligned. 通常, SIGBUS可以在未对齐的内存访问上发送,即在将64位整数写入一个不是8字节对齐的地址时。 However, in recent systems. 但是,在最近的系统中。 either the hardware itself handles it correctly (albeit a bit slower than an aligned access), or the OS emulates the access it in an exception handler (with 2 or more separate memory accesses). 或者硬件本身正确处理它(虽然比对齐访问慢一点),或者操作系统模拟在异常处理程序中访问它(具有2个或更多单独的内存访问)。

In this case, the problem is, that an address outside the permissible virtual address address space was specified. 在这种情况下,问题是指定了允许的虚拟地址空间之外的地址 Despite a pointer has 64-bit, only the address space from 0-(2^48-1) (0x0-0xffffffffffff) is valid on current 64-bit intel processors. 尽管指针具有64位,但只有0-(2 ^ 48-1)(0x0-0xffffffffffff)的地址空间在当前的64位英特尔处理器上有效。 Linux provides even less address space to its processes, from 0-(2^47-1) (which is 0-0x7fffffffffff), the rest (0x800000000000-0xffffffffffff) is used by the kernel. Linux为其进程提供了更少的地址空间,从0-(2 ^ 47-1)(0-0x7ffffffffffff),其余的(0x800000000000-0xffffffffffff)由内核使用。

This means, that the kernel sends a SIGBUS because of an access to an invalid address (every address >= 0x800000000000), as opposed to a SIGSEGV , which means, that an access error to a valid address occurred (missing page entry, wrong access rights, etc.). 这意味着内核发送SIGBUS是因为访问了无效地址 (每个地址> = 0x800000000000),而不是SIGSEGV ,这意味着发生了对有效地址的访问错误(缺少页面输入,访问错误)权利等)。

The only situation where POSIX specifically requires generation of a SIGBUS is, when you create a file-backed mmap region that extends beyond the end of the backing file by more than a whole page, and then access addresses sufficiently far past the end. POSIX特别需要生成SIGBUS的唯一情况是,当您创建一个文件支持的mmap区域,该区域超出了整个页面的支持文件末尾,然后在远远超过结束时访问地址。 (The exact words are "References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal.", from the specification of mmap .) (确切的单词是“从pa开始的地址范围内的引用,并且在对象结束后继续len个字节到整个页面将导致传送SIGBUS信号。”,来自mmap的规范 。)

In all other circumstances, whether you get a SIGSEGV or a SIGBUS for an invalid memory access, or no signal at all, is left completely up to the implementation. 在所有其他情况下,无论是获得无效内存访问的SIGSEGV还是SIGBUS,或者根本没有信号,都完全取决于实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM