简体   繁体   English

“无法在虚拟地址处处理内核NULL指针取消引用。”-在向内核模块发送信号时| 面向对象

[英]“Unable to handle kernel NULL pointer dereference at Virtual Address.” - On signalling the Kernel Module | OOPS

I was learning some basics of kernel modules and threads. 我正在学习内核模块和线程的一些基础知识。 And so i tried to make a example module and test it. 因此,我尝试制作一个示例模块并对其进行测试。 Now, it loads successfully. 现在,它已成功加载。

Module code: 模块代码:

#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/kthread.h>
#include <linux/delay.h>
#include <linux/version.h>


static struct task_struct *thread_st;

// Function called by thread
static int thread_fun(void *unused)
{
    allow_signal(SIGKILL);
    while(!kthread_should_stop())
    {
        printk(KERN_INFO "Thread Running\n");
        ssleep(5);

        if(signal_pending(current))
            break;
    }
    printk(KERN_INFO "Thread Stopping\n");
    do_exit(0);
    return 0;
}



// Module initialisation
static int __init init_thread(void)
{
    printk(KERN_INFO "Creating Thread\n");

    thread_st = kthread_run(thread_fun, NULL, "mythread");
    if(thread_st)
        printk(KERN_INFO "Thread created successfully\n");
    else
        printk(KERN_INFO "Thread creation failed\n");
    return 0;

}




// Module exit
static void __exit cleanup_thread(void)
{
    printk(KERN_INFO "Cleaning up\n");
    if(thread_st)
    {
        kthread_stop(current);
        printk(KERN_INFO "Thread Stopped\n");
    }
}

module_init(init_thread);
module_exit(cleanup_thread);

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Pinkesh Badjatiya");
MODULE_DESCRIPTION("Simple Kernel Module");

Now, once the module is loaded the procedure i follow to unload it is, 现在,一旦模块加载完毕,我要执行的卸载步骤如下:

  1. Send a SIGKILL signal, sudo kill -9 [PID] 发送SIGKILL信号, sudo kill -9 [PID]
  2. Wait for the dmesg to show 'Thread Stopping' , which simply means that the kthread_should_stop() has returned true. 等待dmesg显示'Thread Stopping' ,这仅意味着kthread_should_stop()已返回true。
  3. Remove the module, sudo rmmod [MODULE_NAME] 删除模块sudo rmmod [MODULE_NAME]

dmesg log: dmesg日志:

[  492.979030] Creating Thread
[  492.979753] Thread created successfully
[  492.979776] Thread Running
[  497.985420] Thread Running
[  502.992223] Thread Running
[  507.999007] Thread Running
[  513.005837] Thread Running
[  518.012585] Thread Running
[  523.019354] Thread Running
[  528.026195] Thread Running
[  533.032919] Thread Running
[  538.039795] Thread Running
[  543.046588] Thread Running
[  548.053383] Thread Stopping
[  556.317200] Cleaning up
[  556.317212] Thread Stopped

Now when i change the variable current with the original used struct pointer thread_st and then load the module and follow the same procedure as above to remove the module, the kernel gives a panic(OOPS) and fills up the dmesg log. 现在,当我使用原始使用的结构指针thread_st更改变量current ,然后加载模块并按照上述相同的步骤删除模块时,内核会发出panic(OOPS)并填充dmesg日志。

I also get a Report Error popup on Ubuntu. 我还在Ubuntu上看到Report Error弹出窗口。

dmesg log: dmesg日志:

[ 1269.832922] Creating Thread
[ 1269.833888] Thread created successfully
[ 1269.834217] Thread Running
[ 1274.839425] Thread Running
[ 1279.846211] Thread Running
[ 1284.853017] Thread Running
[ 1289.859819] Thread Running
[ 1294.866589] Thread Running
[ 1299.873353] Thread Stopping
[ 1305.758783] Cleaning up
[ 1305.758853] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1305.762603] IP: [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.766266] PGD 0 
[ 1305.769967] Oops: 0000 [#3] SMP 
[ 1305.774675] Modules linked in: kernel_thread_example(OE-) vmnet(OE) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OE) cmac rmd160 crypto_null camellia_generic camellia_x86_64 cast6_avx_x86_64 cast6_generic cast5_avx_x86_64 cast5_generic cast_common deflate cts ctr gcm ccm serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common twofish_generic twofish_avx_x86_64 twofish_x86_64_3way xts twofish_x86_64 twofish_common xcbc sha256_ssse3 sha512_ssse3 des_generic aes_x86_64 lrw gf128mul glue_helper ablk_helper xfrm_user ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp af_key xfrm_algo bnep rfcomm bluetooth 6lowpan_iphc uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev media snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi arc4 snd_seq intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ath9k ath9k_common ath9k_hw crct10dif_pclmul snd_seq_device crc32_pclmul snd_timer ath ghash_clmulni_intel cryptd mac80211 joydev serio_raw snd cfg80211 i915 lpc_ich shpchp soundcore drm_kms_helper drm mei_me mei i2c_algo_bit mac_hid video wmi parport_pc ppdev lp parport hid_generic usbhid hid psmouse ahci libahci atl1c [last unloaded: kernel_thread_example]
[ 1305.817666] CPU: 3 PID: 4038 Comm: rmmod Tainted: G      D    OE 3.16.0-50-generic #66~14.04.1-Ubuntu
[ 1305.822078] Hardware name: HCL Infosystems Limited HCL ME LAPTOP/HCL Infosystems Limited, BIOS 203.T01 03/19/2011
[ 1305.826447] task: ffff8800a6221e90 ti: ffff880119700000 task.ti: ffff880119700000
[ 1305.830740] RIP: 0010:[<ffffffff81096d6b>]  [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.834968] RSP: 0018:ffff880119703e90  EFLAGS: 00010246
[ 1305.839081] RAX: 0000000000000000 RBX: ffff8800b6e065e0 RCX: 0000000000000000
[ 1305.843133] RDX: ffffffff81c8ea00 RSI: ffff8800b6e065e0 RDI: 0000000000000000
[ 1305.847062] RBP: ffff880119703e98 R08: 0000000000000086 R09: 0000000000000431
[ 1305.850897] R10: 0000000000000000 R11: ffff880119703c0e R12: ffff8800b6e065e0
[ 1305.854697] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f0325bb6240
[ 1305.858456] FS:  00007f0325595740(0000) GS:ffff88011fa60000(0000) knlGS:0000000000000000
[ 1305.862225] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1305.866197] CR2: 0000000000000000 CR3: 00000000b6e23000 CR4: 00000000000407e0
[ 1305.866199] Stack:
[ 1305.866206]  ffff8800b6e065e0 ffff880119703eb8 ffffffff8106abf2 0000000000000000
[ 1305.866211]  ffff8800b6e065e0 ffff880119703ee0 ffffffff81091868 0000000000000000
[ 1305.866216]  ffffffffc0a61000 0000000000000800 ffff880119703ef0 ffffffffc0a5f086
[ 1305.866217] Call Trace:
[ 1305.866232]  [<ffffffff8106abf2>] __put_task_struct+0x52/0x140
[ 1305.866241]  [<ffffffff81091868>] kthread_stop+0xd8/0xe0
[ 1305.866249]  [<ffffffffc0a5f086>] cleanup_thread+0x23/0xf9d [kernel_thread_example]
[ 1305.866259]  [<ffffffff810ebbb2>] SyS_delete_module+0x162/0x200
[ 1305.866268]  [<ffffffff8176edcd>] system_call_fastpath+0x1a/0x1f
[ 1305.866318] Code: ff ff 85 c0 0f 84 33 fe ff ff e9 0c fe ff ff 90 66 66 66 66 90 55 48 89 e5 53 48 8b 87 c0 05 00 00 48 89 fb 48 8b bf b8 05 00 00 <8b> 00 48 c7 83 b8 05 00 00 00 00 00 00 f0 ff 0f 74 23 48 8b bb 
[ 1305.866324] RIP  [<ffffffff81096d6b>] exit_creds+0x1b/0x70
[ 1305.866326]  RSP <ffff880119703e90>
[ 1305.866328] CR2: 0000000000000000
[ 1305.866378] ---[ end trace 0bd516c6629996c7 ]---

I am not able to figure why is this happening. 我不知道为什么会这样。
I searched on internet but could not find any reason. 我在互联网上搜索,但找不到任何原因。

Also, Is the variable current already declared in any of the above headers and what is the problem with using thread_st which i have created above? 另外,是否已在上述任何头文件中声明了current变量,并且使用上面创建的thread_st有什么问题?

From the description of kthread_stop function: 从kthread_stop函数的描述中:

If threadfn() may call do_exit() itself, the caller must ensure task_struct can't go away. 如果threadfn()可以调用do_exit()本身,则调用者必须确保task_struct无法消失。

This means that you cannot simply exit from kthread if it is terminated by kthread_stop() elsewhere. 这意味着,如果它在其他地方被kthread_stop()终止,则不能简单地退出kthread。 You should either exit only when found kthread_should_stop() being true, or should grub reference to task_struct (in some way) before exit. 您应该仅在发现kthread_should_stop()为true时退出,或者在退出前grub对task_struct的引用 (以某种方式)。

Wait for the dmesg to show 'Thread Stopping', which simply means that the kthread_should_stop() has returned true. 等待dmesg显示“线程停止”,这仅表示kthread_should_stop()已返回true。

In case of signal_pending(current) , this would be true without allow_signal() calls . allow_signal() signal_pending(current)情况下, 如果没有allow_signal()调用, allow_signal() true kthread_should_stop() is true only when someone call kthread_stop() for given thread. 仅当有人为给定线程调用kthread_stop()时, kthread_should_stop()为true。 In case of signals, explicitely sent by user space(because of allow_signal() ), signal_pending(current) doesn't reflect kthread_should_stop() state. 如果是用户空间明确发送的信号(由于allow_signal() ), allow_signal() signal_pending(current)不会反映kthread_should_stop()状态。

So, both your implementations are incorrect, because they exit thread in case of signal explicitely sent from use space. 因此,您的两个实现都不正确,因为在从使用空间显式发送信号的情况下,它们都退出线程。

Additionally, using thread_st in the kthread function introduces a race condition: thread function may start before kthread_run() returns (and its result be assigned to thread_st ). 另外,在kthread函数中使用thread_st引入了竞争条件:线程函数可能在kthread_run()返回之前(其结果分配给thread_st )启动。

Update : 更新

You may wait until kthreas_stop() will be called just after "Thread Stopping": 您可能要等到“线程停止”之后立即调用kthreas_stop():

static int thread_fun(void *unused)
{
    allow_signal(SIGKILL);
    while(!kthread_should_stop())
    {
        printk(KERN_INFO "Thread Running\n");
        ssleep(5);

        if(signal_pending(current))
            break;
    }
    printk(KERN_INFO "Thread Stopping\n");

    // Wait until kthread will be actually stopped.
    while(!kthread_should_stop())
    {
        /* 
         * Flush any pending signal.
         *
         * Otherwise interruptible wait will not wait actually.
         */
        flush_signals(current);
        /* Stopping thread is some sort of interrupt. That's why we need interruptible wait. */        
        set_current_state(TASK_INTERRUPTIBLE);
        if(!kthread_should_stop()) schedule();
        set_current_state(TASK_RUNNING);
    }

    return 0;
}
  1. current always points to currently running task and is included via some kernel headers. current始终指​​向当前正在运行的任务,并通过一些内核头文件包含在其中。 So we need to use it carefully. 因此,我们需要仔细使用它。 And hence in below written function you are trying to stop the task that called cleanup_thread() ie rmmod process as cleanup_thread() is a module exit function 因此,在下面编写的函数中,您试图停止名为cleanup_thread()的任务,即rmmod进程,因为cleanup_thread()是模块退出函数

     static void __exit cleanup_thread(void) { printk(KERN_INFO "Cleaning up\\n"); if(thread_st) { kthread_stop(current); printk(KERN_INFO "Thread Stopped\\n"); } } 
  2. Probable cause of the issue is first you are killing the thread with kill -9. 造成此问题的可能原因是,首先您要使用kill -9杀死线程。 This causes the thread to die and task_struct gets freed. 这将导致线程死亡,并释放task_struct。 But since thread_st is not made to zero, it is a dangling pointer ie it is pointing to already freed location. 但是由于thread_st不为零,因此它是一个悬空指针,即它指向已释放的位置。

Then in cleanup_exit() if you call kthread_stop(thread_st), then actually you are passing invalid memory location and hence kernel is crashing. 然后在cleanup_exit()中,如果您调用kthread_stop(thread_st),则实际上您正在传递无效的内存位置,因此内核崩溃。

Try nullifying thread_st before you do do_exit() 在执行do_exit()之前,请尝试使thread_st无效

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Linux 内核编程:“无法处理虚拟地址 [地址] 处的内核空指针取消引用” - Linux Kernel Programming: “Unable to handle kernel NULL pointer dereference at virtual address [address]” Helping with Linux kernel dump crash: Unable to handle kernel NULL pointer dereference at virtual address 00000001 - Helping with Linux kernel dump crash: Unable to handle kernel NULL pointer dereference at virtual address 00000001 无法处理内核空指针取消引用 - unable to handle kernel null pointer dereference Linux 内核编程:“无法处理内核空指针解引用” - Linux Kernel Programming: "Unable to handle kernel NULL pointer dereference" 内核无法处理NULL指针解除引用 - 使用kmem_cache_alloc和struct - Kernel unable to handle NULL pointer dereference - using kmem_cache_alloc with struct 无法处理内核模块中的空指针 - Unable to handle null pointers in Kernel Module ioremap-无法处理虚拟地址XXXXXXXX上的内核分页请求 - ioremap - Unable to handle kernel paging request at virtual address XXXXXXXX 来自 kzalloc 的 memset 中的 Linux 内核空指针取消引用 - Linux kernel NULL-pointer dereference in memset from kzalloc BUG:使用linux链表时kernel NULL指针解引用 - BUG: kernel NULL pointer dereference when using linux linked list 使用内核模块进行内存写入会导致Android内核中的内核异常 - Memory write with kernel module causes kernel oops in Android kernel
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM