简体   繁体   English

如何调试在启动过程中冻结的Linux内核?

[英]How to debug a Linux kernel that freezes during boot?

I have a legacy device with a binary Linux 2.6.18 kernel that boots normally to its rootfs. 我有一个带有二进制Linux 2.6.18内核的旧设备,该内核通常会引导到其rootfs。 However, if I try to compile this kernel from the source, the resulting kernel binary will freeze during the boot. 但是,如果我尝试从源代码编译此内核,则生成的内核二进制文件将在引导过程中冻结。 I don't have the .config file used to build the previous kernel binary that is currently booting normally. 我没有用于构建当前正在正常引导的先前内核二进制文件的.config文件。

The boot is freezing and no error output is provided. 引导程序冻结,不提供错误输出。 Here is the boot log: 这是启动日志:

Linux version 2.6.18-6.2 (myuser@host) (gcc version 4.2.0 20070124 (prerelease) - BRCM 10ts-20080721) #10 SMP Sun Apr 28 18:25:24 BRT 2013
Fetching vars from bootloader... OK (E,d,B,C)
Detected 512 MB on MEMC0 (strap 0x23430310)
Board strapped at 512 MB, default is 256 MB
Options: sata=1 enet=1 emac_1=1 no_mdio=0 docsis=0 ebi_war=0 pci=1 smp=1
CPU revision is: 0002a044
FPU revision is: 00130001
Primary instruction cache 32kB, physically tagged, 2-way, linesize 64 bytes.
Primary data cache 64kB, 4-way, linesize 64 bytes.
<6>Synthesized TLB refill handler (23 instructions).
<6>Synthesized TLB load handler fastpath (37 instructions).
<6>Synthesized TLB store handler fastpath (37 instructions).
<6>Synthesized TLB modify handler fastpath (36 instructions).
Determined physical RAM map:
 memory: 10000000 @ 00000000 (usable)
 memory: 10000000 @ 20000000 (usable)
Using 32MB for memory, overwrite by passing mem=xx
User-defined physical RAM map:
node [00000000, 02000000: RAM]
node [02000000, 0e000000: RSVD]
node [20000000, 10000000: RAM]
<5>Reserving 224 MB upper memory starting at 02000000
<7>On node 0 totalpages: 65536
<7>  DMA zone: 65536 pages, LIFO batch:15
<7>On node 1 totalpages: 65536
<7>  Normal zone: 65536 pages, LIFO batch:15
Built 2 zonelists.  Total pages: 131072
<5>Kernel command line: root=/dev/mtdblock3 rw rootfstype=jffs2 console=ttyS0,115200
PID hash table entries: 4096 (order: 12, 16384 bytes)
mips_counter_frequency = 202000000 from Calibration, = 202500000 from header(CPU_MHz/2)
Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
Memory: 286336k/524288k available (2924k kernel code, 237760k reserved, 544k data, 164k init, 0k highmem)
Mount-cache hash table entries: 512
Checking for 'wait' instruction...  available.
plat_prepare_cpus: ENABLING 2nd Thread...
TP0: prom_boot_secondary: Kick off 2nd CPU...
CPU revision is: 0002a044
FPU revision is: 00130001
Primary instruction cache 32kB, physically tagged, 2-way, linesize 64 bytes.
Primary data cache 64kB, 4-way, linesize 64 bytes.
Synthesized TLB refill handler (23 instructions).
Brought up 2 CPUs
migration_cost=1000
NET: Registered protocol family 16
registering PCI controller with io_map_base unset
registering PCI controller with io_map_base unset
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
NET: Registered protocol family 2
IP route cache hash table entries: 16384 (order: 4, 65536 bytes)
TCP established hash table entries: 65536 (order: 7, 524288 bytes)
TCP bind hash table entries: 32768 (order: 6, 262144 bytes)
TCP: Hash tables configured (established 65536 bind 32768)
TCP reno registered
brcm-pm: disabling power to USB block
brcm-pm: disabling power to ENET block
brcm-pm: disabling power to SATA block
squashfs: version 3.2-r2 (2007/01/15) Phillip Lougher
JFFS2 version 2.2. (NAND) (SUMMARY)  (C) 2001-2006 Red Hat, Inc.
io scheduler noop registered
io scheduler anticipatory registered (default)
io scheduler deadline registered
io scheduler cfq registered
Serial: 8250/16550 driver $Revision: 1.1.1.1 $ 3 ports, IRQ sharing disabled
serial8250: ttyS0 at MMIO 0x0 (irq = 22) is a 16550A
serial8250: ttyS1 at MMIO 0x0 (irq = 66) is a 16550A
serial8250: ttyS2 at MMIO 0x0 (irq = 67) is a 16550A
loop: loaded (max 8 devices)
brcm-pm: enabling power to ENET block

How do I go about debugging this? 我该如何调试呢? Any insights on possible solutions to the freeze are welcome as well. 也欢迎对冻结的可能解决方案有任何见解。

One way to deal with this is to enable CONFIG_EARLY_PRINTK and add some printk() statements in kernel code that you suspect is freezing (most likely some drivers configuration parameters are wrong). 解决此问题的一种方法是启用CONFIG_EARLY_PRINTK并在您怀疑冻结的内核代码中添加一些printk()语句(很可能某些驱动程序配置参数错误)。

Also, you might be able to get old kernel config by looking at /boot/config-* , or at /proc/config.gz (it will exist only if old kernel had option CONFIG_IKCONFIG_PROC enabled). 另外,您可以通过查看/boot/config-*/proc/config.gz来获取旧内核配置(仅当旧内核启用了选项CONFIG_IKCONFIG_PROC/boot/config-*才存在)。

There are some debugger options like kdb and kgdb, but I've always found them flaky and temperamental. 有一些调试器选项,例如kdb和kgdb,但我总是发现它们有些不稳定。 Probably more-so if you can't even get your machine to boot. 可能更多,如果您甚至无法启动计算机。 I concur with the CONFIG_EARLY_PRINTK advise, and would advise you to make sure you get kernel output on boot (not "quiet"), but it seems you have this already. 我同意CONFIG_EARLY_PRINTK的建议,并建议您确保引导时获得内核输出(而不是“安静”),但看来您已经拥有了。

The "GPIO" suggestion above could work - but is very system-dependent and cumbersome. 上面的“ GPIO”建议可以起作用-但非常依赖系统且麻烦。 That said, I think you want an answer better than "Start adding a lot of printk's". 就是说,我认为您想要一个比“开始添加很多printk的”更好的答案。 You can start with the offending ethernet driver (BRC-PM?) or try removing that to see if that's related. 您可以从有问题的以太网驱动程序(BRC-PM?)开始,或尝试删除该驱动程序以查看是否相关。

It'll take some investigation - sorry, but no "magic bullet"! 需要进行一些调查-很抱歉,但是没有“魔术子弹”! :-O :-O

initcall_debug添加到CONFIG_CMDLINE (内核命令行)。

CONFIG_CMDLINE="root=/dev/ram0 rw mem=512M@0x0 initrd=0x800000,16M console=ttyS0,38400n8 rootfstype=ext2 init=/bin/busybox init -s initcall_debug"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM