简体   繁体   English

了解 64 位 Linux 上的 kmap

[英]Understanding kmap on 64-bit Linux

Let me start by admitting that the concept of high memory and low memory on Linux is still not completely clear in my mind even after reading several relevant resources.让我首先承认,即使在阅读了几篇相关资源后,我仍然不完全清楚 Linux 上的高 memory 和低 memory 的概念。 However, from what I understand on 64-bit Linux there's no high memory anyway (correct me if I am wrong).但是,据我对 64 位 Linux 的了解,无论如何都没有高 memory (如果我错了,请纠正我)。

I am trying to understand how kmap and address spaces work on Linux kernel version 5.8.1 configured with defconfig for arm64.我试图了解 kmap 和地址空间如何在 Linux kernel 版本 5.8.1 上为 arm64 配置defconfig工作。

I have added the following system call:我添加了以下系统调用:

SYSCALL_DEFINE1(mycall, unsigned long __user, user_addr)
{
    struct page *pages[1];
    int *p1, *p2;

    p1 = (int *) user_addr;
    *p1 = 1; /* this works */
    pr_info("kernel: first: 0x%lx", (long unsigned) p1);

    if (get_user_pages(user_addr, 1, FOLL_WRITE, pages, NULL) != 1)
        return -1;

    p2 = kmap(pages[0]);
    *p2 = 2; /* this also works */
    pr_info("kernel: second: 0x%lx", (long unsigned) p2);

    return 0;
}

From user-space I allocate a whole memory page (on a page boundary) which I pass to the kernel as a parameter to that system call.我从用户空间分配了整个 memory 页面(在页面边界上),我将其作为该系统调用的参数传递给 kernel。 Modifying that memory by dereferencing either pointer from within the kernel works perfectly fine.通过从 kernel 中取消引用任一指针来修改 memory 工作得很好。 However, the two pointers have different values:但是,这两个指针具有不同的值:

[    4.493480] kernel: first: 0x4ff3000
[    4.493888] kernel: second: 0xffff000007ce9000

From what I understand get_user_pages returns the physical page corresponding to that user address (in current's address space).据我了解get_user_pages返回与该用户地址对应的物理页面(在当前地址空间中)。 Then since there's no high memory, I expected kmap to return the exact same address from the user part of the address space.然后由于没有高 memory,我希望kmap从地址空间的用户部分返回完全相同的地址。

According to the virtual memory layout of arm64 , the address returned by kmap lies in a range described as "kernel logical memory map".根据 arm64 的虚拟 memory 布局, kmap返回的地址位于描述为“内核逻辑 memory 映射”的范围内。 Is this a new mapping just created by kmap or is this another previously existing mapping for the same page?这是刚刚由kmap创建的新映射,还是同一页面的另一个先前存在的映射?

Can somebody explain what exactly is going on here?有人可以解释这里到底发生了什么吗?

The memory referred to by user_addr (or p1 ) and by p2 will be the same physical memory pages once they have actually been pinned into physical memory by get_user_pages() .user_addr get_user_pages()p1 )和p2引用的 memory 将是相同的物理 memory 页面,一旦它们被 getuser_user_s 实际固定到物理 memory 页面中(Before the get_user_pages() call, the pages might not be in physical memory yet.) However, user_addr (and p1 ) are a user-space address of the page, and p2 is a kernel-space address of the page. (在get_user_pages()调用之前,页面可能不在物理 memory 中。)但是, user_addr (和p1 )是页面的用户空间地址,而p2是页面的内核空间地址。 kmap() will create a temporary mapping of a physical memory page to kernel-space. kmap()将创建物理 memory 页面到内核空间的临时映射。

On arm64 (and also amd64), if bit 63 is treated as a sign bit, then user-space addresses are non-negative and kernel space addresses are negative.在 arm64(以及 amd64)上,如果第 63 位被视为符号位,则用户空间地址为非负数,kernel 空间地址为负数。 So there is no way that the numeric values of the user-space and kernel-space addresses can be equal.因此,用户空间地址和内核空间地址的数值不可能相等。

Most kernel code should not dereference user-space pointers directly.大多数 kernel 代码不应直接取消引用用户空间指针。 The user-space memory access functions and macros should be used, and should be checked for failures.应使用用户空间 memory 访问函数和宏,并检查是否有故障。 The first part of your example should be something like:你的例子的第一部分应该是这样的:

    int __user *p1 = (int __user *)user_addr;

    if (put_user(1, p1))
        return -EFAULT;
    pr_info("kernel: first: 0x%lx\n", (unsigned long)p1);

put_user() will return 0 on success or -EFAULT on failure. put_user()将返回 0 成功或-EFAULT失败。

get_user_pages() will return either the number of pages pinned into memory, or a negative errno value if none of the requested pages could be pinned. get_user_pages()将返回固定到 memory 的页面数,或者如果没有请求的页面可以固定,则返回负 errno 值。 (It will only return 0 if the number of requested pages is 0.) The number of pages actually pinned may be less than the number requested, but since your code only requests a single page, the return value in that case would be either 1 or a negative errno value. (如果请求的页数为0 ,它只会返回 0。)实际固定的页数可能少于请求的页数,但由于您的代码只请求单个页面,因此在这种情况下返回值将是1或负 errno 值。 You can use a variable to capture the error number.您可以使用变量来捕获错误编号。 Note that it must be called with the current task's mmap semaphore locked:请注意,必须在当前任务的 mmap 信号量锁定的情况下调用它:

#define NR_REQ 1

    struct page *pages[NR_REQ];
    long nr_gup;

    mmap_read_lock(current->mm);
    nr_gup = get_user_pages(user_addr, NR_REQ, FOLL_WRITE, pages, NULL);
    mmap_read_unlock(current->mm);
    if (nr_gup < 0)
        return nr_gup;
    if (nr_gup < NR_REQ) {
        /* Some example code to deal with not all pages pinned - just 'put' them. */
        long i;

        for (i = 0; i < nr_gup; i++)
            put_page(pages[i]);
        return -ENOMEM;
    }

Note: You could use get_user_pages_fast() instead of get_user_pages() .注意:您可以使用get_user_pages_fast()代替get_user_pages() If get_user_pages_fast() is used, the calls to mmap_read_lock() and mmap_read_unlock() above must be removed:如果get_user_pages_fast() ,则必须删除上mmap_read_lock()mmap_read_unlock()的调用:

#define NR_REQ 1

    struct page *pages[NR_REQ];
    long nr_gup;

    nr_gup = get_user_pages_fast(user_addr, NR_REQ, FOLL_WRITE, pages);
    if (nr_gup < 0)
        return nr_gup;
    if (nr_gup < NR_REQ) {
        /* Some example code to deal with not all pages pinned - just 'put' them. */
        long i;

        for (i = 0; i < nr_gup; i++)
            put_page(pages[i]);
        return -ENOMEM;
    }

kmap() will temporarily map a page into kernel address space. kmap()将暂时将 map 一个页面放入 kernel 地址空间。 It should be paired with a call to kunmap() to release the temporary mapping:它应该与调用kunmap()配对以释放临时映射:

    p2 = kmap(pages[0]);
    /* do something with p2 here ... */
    kunmap(p2);

Pages pinned by get_user_pages() need to be 'put' using put_page() when finished with.get_user_pages()固定的页面在完成后需要使用put_page()进行“放置”。 If they have been written to, they first need to be marked 'dirty' using set_page_dirty_lock() .如果它们已被写入,则首先需要使用set_page_dirty_lock()将它们标记为“脏”。 The last part of your example should be something like:你的例子的最后一部分应该是这样的:

    p2 = kmap(pages[0]);
    *p2 = 2; /* this also works */
    pr_info("kernel: second: 0x%lx\n", (unsigned long)p2);
    kunmap(p2);
    set_page_dirty_lock(pages[0]);
    put_page(pages[0]);

The above code is not completely robust.上面的代码并不完全健壮。 The pointer p2 could be misaligned for the *p2 dereference, or *p2 could straddle a page boundary.指针p2可能因*p2取消引用而未对齐,或者*p2可能跨越页面边界。 Robust code needs to deal with such situations.健壮的代码需要处理这种情况。

Since accessing the memory through user-space addresses needs to be done through special user-space access functions and macros, may sleep due to page faults (unless the pages have been locked into physical memory), and are only valid (if at all) within a single process, locking the user-space address region into memory with get_user_pages() and mapping the pages to kernel address space (if required) is useful in some circumstances.由于通过用户空间地址访问 memory 需要通过特殊的用户空间访问函数和宏来完成,因此可能由于页面错误而休眠(除非页面已被锁定到物理内存中),并且仅有效(如果有的话)在单个进程中,使用get_user_pages()将用户空间地址区域锁定到 memory 并将页面映射到 kernel 地址空间(如果需要)在某些情况下很有用。 It allows the memory to be accessed from an arbitrary kernel context such as an interrupt handler.它允许从任意 kernel 上下文(例如中断处理程序)访问 memory。 It allows bulk copies to and from memory mapped I/O ( memcpy_toio() or memcpy_fromio() ).它允许批量复制到 memory 映射 I/O( memcpy_toio()memcpy_fromio() )。 DMA operations can be performed on user-memory once it has been locked down by get_user_pages() .一旦用户内存被get_user_pages()锁定,就可以在用户内存上执行 DMA 操作。 In that case the pages will be mapped to "DMA addresses" by the DMA API.在这种情况下,页面将由 DMA API 映射到“DMA 地址”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM