将模块移植到较新的Linux内核：无法分配内存

Question

I have a quite big driver module that I am trying to compile for a recent Linux kernel (3.4.4). 我有一个非常大的驱动程序模块，我正在尝试编译最近的Linux内核（3.4.4）。 I can successfully compile and insmod the same module with a 2.6.27.25 kernel. 我可以使用2.6.27.25内核成功编译和insmod相同的模块。 GCC version are also different, 4.7.0 vs 4.3.0. GCC版本也不同，4.7.0对4.3.0。 Note that this module is quite complicated and I cannot simply go through all the code and all the makefiles. 请注意，此模块非常复杂，我不能简单地浏览所有代码和所有makefile。

When "inserting" the module I get a Cannot allocate memory with the following traces: 当“插入”模块时，我得到一个Cannot allocate memory使用以下跟踪Cannot allocate memory ：

vmap allocation for size 30248960 failed: use vmalloc=<size> to increase size.
vmalloc: allocation failure: 30243566 bytes
insmod: page allocation failure: order:0, mode:0xd2
Pid: 5840, comm: insmod Tainted: G           O 3.4.4-5.fc17.i686 #1
Call Trace:
 [<c092702a>] ? printk+0x2d/0x2f
 [<c04eff8d>] warn_alloc_failed+0xad/0xf0
 [<c05178d9>] __vmalloc_node_range+0x169/0x1d0
 [<c0517994>] __vmalloc_node+0x54/0x60
 [<c0490825>] ? sys_init_module+0x65/0x1d80
 [<c0517a60>] vmalloc+0x30/0x40
 [<c0490825>] ? sys_init_module+0x65/0x1d80
 [<c0490825>] sys_init_module+0x65/0x1d80
 [<c050cda6>] ? handle_mm_fault+0xf6/0x1d0
 [<c0932b30>] ? spurious_fault+0xae/0xae
 [<c0932ce7>] ? do_page_fault+0x1b7/0x450
 [<c093665f>] sysenter_do_call+0x12/0x28
-- clip --

The obvious answer seems to be that the module is allocating too much memory, however: 显而易见的答案似乎是模块分配了太多内存，但是：

I have no problem with the old kernel version, what ever size this module is 我对旧内核版本没有任何问题，这个模块的大小
if I prune some part of this module to get a much lower memory consumption, I will get always the same error message with the new kernel 如果我修剪这个模块的某些部分以获得更低的内存消耗，我会得到与新内核相同的错误消息
I can unload a lot of other modules, but it has no impact (and is it anyway relevant? is there a global limit with Linux regarding the total memory usage by modules) 我可以卸载很多其他模块，但它没有任何影响（并且无论如何相关？Linux是否存在关于模块总内存使用量的全局限制）

I am therefore suspecting a problem with the new kernel not directly related to limited memory. 因此，我怀疑新内核的问题与有限的内存没有直接关系。

The new kernel is complaining about a vmalloc() of 30,000 KB, but with the old kernel, an lsmod gives me a size of 4,800 KB. 新内核抱怨30,000 KB的vmalloc() ，但是对于旧内核，lsmod给我的大小为4,800 KB。 Should these figures be directly related? 这些数字应该直接相关吗？ Is it possible that something went wrong during the build and that it is just too much RAM being requested? 是否有可能在构建过程中出现问题并且要求RAM太多？ When I compile the sections size of both .ko , I do not see big differences. 当我编译两个.ko的部分大小时，我没有看到很大的差异。

So I am trying to understand where the problem is from. 所以我试图了解问题的来源。 When I check the dumped stack, I am unable to find the matching piece of code. 当我检查转储堆栈时，我无法找到匹配的代码段。 It seems that the faulty vmalloc() is done by sys_init_module() , which is init_module() from kernel/module.c . 似乎错误的vmalloc()是由sys_init_module()完成的，它是来自kernel/module.c init_module() 。 But the code does not match. 但代码不匹配。 When I check the object code from my .ko , the init_module() code also does not match. 当我从.ko检查目标代码时， init_module()代码也不匹配。

I am more or less blocked as I do not know the kernel well enough, and all the build system and the module loading is quite tough to understand. 我或多或少被阻止，因为我不太了解内核，并且所有构建系统和模块加载都很难理解。 The error occurs before the module is loaded, as I suspect that some functions are missing and insmod does not report these errors at this point. 在加载模块之前发生错误，因为我怀疑某些功能缺失，而insmod不会报告这些错误。

Answer 1

I believe the allocation is done in layout_and_allocate , which is called by load_module . 我相信分配是在layout_and_allocate完成的，由load_module 。 Both are static function, so they may be inlined, and therefore not on the stack. 两者都是静态函数，因此它们可能是内联的，因此不在堆栈中。
So it's not an allocation done by your code, but an allocation done by Linux in order to load your code. 所以它不是由您的代码完成的分配，而是由Linux完成的分配以加载您的代码。

If your old kernel is 4.8MB and the new one is 30MB, it can explain why it fails. 如果你的旧内核是4.8MB而新的内核是30MB，它可以解释它失败的原因。
So the question is why is it so large. 所以问题是为什么它如此之大。

The size may be due to the amount of code (not likely that it has grown so much) or statically allocated data. 大小可能是由于代码量（不太可能增长太多）或静态分配的数据。
A likely explanation is that you have a large statically allocated array, whose size is defined in Linux. 一个可能的解释是你有一个大的静态分配数组，其大小在Linux中定义。 If the size has grown significantly, your array would grow. 如果尺寸显着增加，您的阵列将会增长。
A guess - an array whose size is NR_CPUS . 猜测 - 一个大小为NR_CPUS的数组。

You should be able to use commands such as nm or objdump to find such an array. 您应该能够使用诸如nm或objdump类的命令来查找这样的数组。 I'm not sure how exactly to do it however. 但是我不知道究竟是怎么做到的。

Answer 2

The problem was actually due to the debug sections in the module. 问题实际上是由于模块中的调试部分。 The old kernel was able to ignore these sections, but the new one was counting them in the total size to allocate. 旧内核能够忽略这些部分，但是新部分正在计算它们的总大小以进行分配。 However, when enabling the pr_debug() traces from module.c at loading time, these sections were not dumped with the others. 但是，在加载时从module.c启用pr_debug()跟踪时，这些部分不会与其他部分一起转储。

How to get rid of them and solve the problem: 如何摆脱它们并解决问题：

objcopy -R .debug_aranges \
    -R .debug_info \
    -R .debug_abbrev \
    -R .debug_line \
    -R .debug_frame \
    -R .debug_str \
    -R .debug_loc \
    -R .debug_ranges \
    orignal.ko new.ko

It is also possible that the specific build files for this project were adding debug information "tailored" for the old kernel version, but when trying with a dummy module, I find exactly the same kind of debug sections appended, so I would rather suspect some policy change regarding module management in the kernel or in Fedora. 此项目的特定构建文件也可能为旧内核版本添加“定制”的调试信息，但是当尝试使用虚拟模块时，我发现附加的调试部分完全相同，所以我宁愿怀疑一些关于内核或Fedora中的模块管理的策略更改。

Any information regarding these changes are welcome. 欢迎任何有关这些变化的信息。

将模块移植到较新的Linux内核：无法分配内存

问题描述

2 个解决方案

解决方案1
5 2012-07-19 13:14:09

解决方案2
2 2012-07-20 11:58:29

将模块移植到较新的Linux内核：无法分配内存

问题描述

2 个解决方案

解决方案1 5 2012-07-19 13:14:09

解决方案2 2 2012-07-20 11:58:29

解决方案1
5 2012-07-19 13:14:09

解决方案2
2 2012-07-20 11:58:29