简体   繁体   English

在 Linux 中遍历进程的页表

[英]Walking page tables of a process in Linux

i'm trying to navigate the page tables for a process in linux.我正在尝试为 linux 中的进程导航页表。 In a kernel module i realized the following function:在内核模块中,我实现了以下功能:

static struct page *walk_page_table(unsigned long addr)
{
    pgd_t *pgd;
    pte_t *ptep, pte;
    pud_t *pud;
    pmd_t *pmd;

    struct page *page = NULL;
    struct mm_struct *mm = current->mm;

    pgd = pgd_offset(mm, addr);
    if (pgd_none(*pgd) || pgd_bad(*pgd))
        goto out;
    printk(KERN_NOTICE "Valid pgd");

    pud = pud_offset(pgd, addr);
    if (pud_none(*pud) || pud_bad(*pud))
        goto out;
    printk(KERN_NOTICE "Valid pud");

    pmd = pmd_offset(pud, addr);
    if (pmd_none(*pmd) || pmd_bad(*pmd))
        goto out;
    printk(KERN_NOTICE "Valid pmd");

    ptep = pte_offset_map(pmd, addr);
    if (!ptep)
        goto out;
    pte = *ptep;

    page = pte_page(pte);
    if (page)
        printk(KERN_INFO "page frame struct is @ %p", page);

 out:
    return page;
}

This function is called from the ioctl and addr is a virtual address in process address space:这个函数是从ioctl调用的, addr是进程地址空间中的一个虚拟地址:

static int my_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long addr)
{
   struct page *page = walk_page_table(addr);
   ...
   return 0;
}

The strange thing is that calling ioctl in a user space process, this segfaults...but it seems that the way i'm looking for the page table entry is correct because with dmesg i obtain for example for each ioctl call:奇怪的是,在用户空间进程中调用ioctl ,这个段错误......但似乎我寻找页表条目的方式是正确的,因为使用dmesg我获得例如每个ioctl调用:

[ 1721.437104] Valid pgd
[ 1721.437108] Valid pud
[ 1721.437108] Valid pmd
[ 1721.437110] page frame struct is @ c17d9b80

So why the process can't complete correcly the `ioctl' call?那么为什么进程不能正确地完成“ioctl”调用呢? Maybe i have to lock something before navigating the page tables?也许我必须在导航页表之前锁定某些内容?

I'm working with kernel 2.6.35-22 and three levels page tables.我正在使用内核 2.6.35-22 和三级页表。

Thank you all!谢谢你们!

pte_unmap(ptep); 

is missing just before the label out.在标签出来之前丢失。 Try to change the code in this way:尝试以这种方式更改代码:

    ...
    page = pte_page(pte);
    if (page)
        printk(KERN_INFO "page frame struct is @ %p", page);

    pte_unmap(ptep); 

out:

Look at /proc/<pid>/smaps filesystem, you can see the userspace memory:查看/proc/<pid>/smaps文件系统,可以看到用户空间内存:

cat smaps 
bfa60000-bfa81000 rw-p 00000000 00:00 0          [stack]
Size:                136 kB
Rss:                  44 kB

and how it is printed is via fs/proc/task_mmu.c (from kernel source):以及它的打印方式是通过fs/proc/task_mmu.c (来自内核源代码):

http://lxr.linux.no/linux+v3.0.4/fs/proc/task_mmu.c http://lxr.linux.no/linux+v3.0.4/fs/proc/task_mmu.c

   if (vma->vm_mm && !is_vm_hugetlb_page(vma))
               walk_page_range(vma->vm_start, vma->vm_end, &smaps_walk);
               show_map_vma(m, vma.....);
        seq_printf(m,
                   "Size:           %8lu kB\n"
                   "Rss:            %8lu kB\n"
                   "Pss:            %8lu kB\n"

And your function is somewhat like that of walk_page_range().你的函数有点像 walk_page_range() 的函数。 Looking into walk_page_range() you can see that the smaps_walk structure is not supposed to change while it is walking:查看 walk_page_range() 可以看到 smaps_walk 结构在行走时不应该改变:

http://lxr.linux.no/linux+v3.0.4/mm/pagewalk.c#L153

For eg:

                }
 201                if (walk->pgd_entry)
 202                        err = walk->pgd_entry(pgd, addr, next, walk);
 203                if (!err &&
 204                    (walk->pud_entry || walk->pmd_entry || walk->pte_entry

If memory contents were to change, then all the above checking may get inconsistent.如果内存内容发生变化,那么上述所有检查可能会不一致。

All these just mean that you have to lock the mmap_sem when walking the page table:所有这些只是意味着您必须在遍历页表时锁定 mmap_sem:

   if (!down_read_trylock(&mm->mmap_sem)) {
            /*
             * Activate page so shrink_inactive_list is unlikely to unmap
             * its ptes while lock is dropped, so swapoff can make progress.
             */
            activate_page(page);
            unlock_page(page);
            down_read(&mm->mmap_sem);
            lock_page(page);
    }

and then followed by unlocking:然后解锁:

up_read(&mm->mmap_sem);

And of course, when you issue printk() of the pagetable inside your kernel module, the kernel module is running in the process context of your insmod process (just printk the "comm" and you can see "insmod") meaning the mmap_sem is lock, it also mean the process is NOT running, and thus there is no console output till the process is completed (all printk() output goes to memory only).当然,当您在内核模块中发出页表的 printk() 时,内核模块正在 insmod 进程的进程上下文中运行(只需打印“comm”,您就可以看到“insmod”),这意味着 mmap_sem 是锁定,这也意味着进程没有运行,因此在进程完成之前没有控制台输出(所有 printk() 输出仅进入内存)。

Sounds logical?听起来合乎逻辑?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM