简体   繁体   English

如何调试ARM Linux内核(msleep())锁定?

[英]How to debug ARM Linux kernel (msleep()) lock up?

I am first of all looking for debugging tips. 我首先要寻找调试技巧。 If some one can point out the one line of code to change or the one peripheral config bit to set to fix the problem, that would be terrific. 如果有人可以指出要更改的一行代码或者要设置一个外设配置位以解决问题,那就太棒了。 But that's not what I'm hoping for; 但这不是我所希望的; I'm looking more for how do I go about debugging it. 我正在寻找更多如何调试它。

Googling "msleep hang linux kernel site:stackoverflow.com" yields 13 answers and none is on the point, so I think I'm safe to ask. 谷歌搜索“msleep挂起linux内核站点:stackoverflow.com”产生13个答案,没有一个是关键,所以我想我可以安全地问。

I rebuild an ARM Linux kernel for an embedded TI AM1808 ARM processor (Sitara/DaVinci?). 我为嵌入式TI AM1808 ARM处理器(Sitara / DaVinci?)重建了ARM Linux内核。 I see the all the boot log up to the login: prompt coming out of the serial port, but trying to login gets no response, doesn't even echo what I typed. 我看到所有启动日志都登录到了来自串口的login:提示符,但尝试登录没有响应,甚至没有回显我输入的内容。

After lots of debugging I arrived at the kernel and added debugging code between line 828 and 830 (yes, kernel version is 2.6.37). 经过大量调试后,我到达内核并在第828行和第830行之间添加了调试代码(是的,内核版本是2.6.37)。 This is at this point in the kernel mode before 'sbin/init' is called: 这是在调用'sbin / init'之前的内核模式中:

http://lxr.linux.no/linux+v2.6.37/init/main.c#L815 http://lxr.linux.no/linux+v2.6.37/init/main.c#L815

Right before line 830 I added a forever loop printk and I see the results. 在第830行之前,我添加了一个永久循环printk,我看到了结果。 I have let it run for about a couple of hour and it counts to about 2 million. 我让它运行了大约几个小时,它大约有200万。 Sample line: 样品行:

dbg:init/main.c:1202: 2088430

So it has spit out 60 million bytes without problem. 所以它没有问题就吐出了6000万字节。

However, if I add msleep(1000) in the loop, it prints only once, ie msleep () does not return. 但是,如果我在循环中添加msleep(1000),它只打印一次,即msleep()不会返回。

Details: Adding a conditional printk at line 4073 in the scheduler that condition on a flag that get set at the start of the forever test loop described above shows that the schedule() is no longer called when it hangs: 详细信息:在调度程序中的第4073行添加条件printk,该条件设置在上述永久测试循环开始时设置的标志上,表示schedule()在挂起时不再调用:

http://lxr.linux.no/linux+v2.6.37/kernel/sched.c#L4064 http://lxr.linux.no/linux+v2.6.37/kernel/sched.c#L4064

The only selections under .config/'Device Drivers' are: Block devices I2C support SPI support .config /'设备驱动程序'下的唯一选择是:块设备I2C支持SPI支持

The kernel and its ramdisk are loaded using uboot/TFTP. 使用uboot / TFTP加载内核及其ramdisk。 I don't believe it tries to use the Ethernet. 我不相信它试图使用以太网。 Since all these happened before '/sbin/init', very little should be happenning. 由于所有这些都发生在'/ sbin / init'之前,因此应该很少发生。

More details: I have a very similar board with the same CPU. 更多细节:我有一个非常相似的CPU与相同的CPU。 I can run the same uImage and the same ramdisk and it works fine there. 我可以运行相同的uImage和相同的ramdisk,它在那里工作正常。 I can login and do the usual things. 我可以登录并做通常的事情。

I have run memory test (64 MB total, limit kernel to 32M and test the other 32M; it's a single chip DDR2) and found no problem. 我已经运行了内存测试(总共64 MB,将内核限制为32M并测试其他32M;它是单芯片DDR2)并且没有发现任何问题。 One board uses UART0, and the other UART2, but boot log comes out of both so it should not be the problem. 一块板使用UART0,另一块板使用UART2,但启动日志都来自两者,因此不应该是问题。

Any debugging tips is greatly appreciated. 非常感谢任何调试技巧。 I don't have an appropriate JTAG so I can't use that. 我没有合适的JTAG,所以我不能使用它。

If msleep doesn't return or doesn't make it to schedule , then in order to debug we can follow the call stack. 如果msleep没有返回或没有进行schedule ,那么为了调试我们可以跟随调用堆栈。

msleep calls schedule_timeout_uninterruptible(timeout) which calls schedule_timeout(timeout) which in the default case exits without calling schedule if the timeout in jiffies passed to it is < 0, so that is one thing to check. msleep调用schedule_timeout_uninterruptible(timeout)调用schedule_timeout(timeout) ,如果传递给它的jiffies中的超时为<0,则在默认情况下退出而不调用schedule,因此这是一件要检查的事情。

If timeout is positive , then setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); 如果timeout为正,则setup_timer_on_stack(&timer, process_timeout, (unsigned long)current); is called, followed by __mod_timer(&timer, expire, false, TIMER_NOT_PINNED); 调用,然后是__mod_timer(&timer, expire, false, TIMER_NOT_PINNED); before calling schedule . 在致电schedule之前

If we aren't getting to schedule then something must be happening in either setup_timer_on_stack or __mod_timer . 如果我们没有按schedule那么必须在setup_timer_on_stack__mod_timer发生一些事情。

The calltrace for setup_timer_on_stack is setup_timer_on_stack calls setup_timer_on_stack_key which calls init_timer_on_stack_key is either external if CONFIG_DEBUG_OBJECTS_TIMERS is enabled or calls init_timer_key(timer, name, key); setup_timer_on_stack的setup_timer_on_stacksetup_timer_on_stack调用setup_timer_on_stack_key ,如果启用了CONFIG_DEBUG_OBJECTS_TIMERS则调用init_timer_on_stack_key为外部调用或调用init_timer_key(timer, name, key); which calls debug_init followed by __init_timer(timer, name, key) . 而话费debug_init其次__init_timer(timer, name, key)

__mod_timer first calls timer_stats_timer_set_start_info(timer); __mod_timer首先调用timer_stats_timer_set_start_info(timer); then a whole lot of other function calls. 然后是很多其他函数调用。

I would advise starting by putting a printk or two in schedule_timeout probably either side of the setup_timer_on_stack call or either side of the __mod_timer call. 我会建议通过将一个或两个的printk在开始schedule_timeout可能是两侧setup_timer_on_stack来电或两侧__mod_timer电话。

This problem has been solved. 这个问题已经解决了。

With liberal use of prink it was determined that schedule() indeed switches to another task, the idle task. 随着prink的自由使用,确定schedule()确实切换到另一个任务,即空闲任务。 In this instance, being an embedded Linux, the original code base I copied from installed an idle task. 在这种情况下,作为嵌入式Linux,我从安装的空闲任务中复制了原始代码库。 That idle task seems not appropriate for my board and has locked up the CPU and thus causing the crash. 这个空闲任务似乎不适合我的电路板,并锁定了CPU,从而导致崩溃。 Commenting out the call to the idle task 注释掉对空闲任务的调用

http://lxr.linux.no/linux+v2.6.37/arch/arm/mach-davinci/cpuidle.c#L93 http://lxr.linux.no/linux+v2.6.37/arch/arm/mach-davinci/cpuidle.c#L93

works around the problem. 解决问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM