[英]Apple: Compile clang frame size with -O0 vs -O2 (kernel)
I have an existing project, which we compile DEBUG for developers (and -O0 so lldb makes sense).我有一个现有项目,我们为开发人员编译 DEBUG(和 -O0 所以 lldb 有意义)。 But I have one function in particular that balloons in size when -O0 is used:但是我有一个 function 特别是当使用 -O0 时气球的大小:
-O2 -Wframe-larger-than=100
warning: stack frame size of 168 bytes in function 'dsl_scan_visitbp'
-O0 -Wframe-larger-than=100
warning: stack frame size of 1160 bytes in function 'dsl_scan_visitbp'
and with some recursion, the stack can be very trashed (16K stacks in kernel).并且通过一些递归,堆栈可能会非常垃圾(内核中的 16K 堆栈)。
First thing to inspect are any local variables, but I believe there are only two:首先要检查的是任何局部变量,但我相信只有两个:
dsl_pool_t *dp = scn->scn_dp;
blkptr_t *bp_toread = NULL;
If you want to see the whole function: https://github.com/openzfs/zfs/blob/master/module/zfs/dsl_scan.c#L1908 (Linux sources, but dealing with Apple clang port) If you want to see the whole function: https://github.com/openzfs/zfs/blob/master/module/zfs/dsl_scan.c#L1908 (Linux sources, but dealing with Apple clang port)
There are a bunch of alwaysinline
in that sourcefile, which may also come to play here.那个源文件里有一堆alwaysinline
,可能也来这里玩。
But I am curious why it grows so large with -O0?但我很好奇为什么它会随着 -O0 变大?
Then what to do about it, I can't see any Apple-clang #pragmas to turn "on" optimize in a source file (only turning off optimize) for one function, or one file.然后该怎么办,我看不到任何Apple-clang #pragmas在一个function或一个文件的源文件中打开“优化”(仅关闭优化)。 If I knew what the cause was, perhaps I can control that specific issue with a different pragma.如果我知道原因是什么,也许我可以用不同的编译指示来控制那个特定的问题。
Only solution I see right now, is to have dsl_scan.c
processed differently in the Makefile, so that only that file always gets -O2.我现在看到的唯一解决方案是让dsl_scan.c
在 Makefile 中以不同方式处理,以便只有该文件始终获得 -O2。 But that is a bit tedious.但这有点乏味。
I'm not familiar with the code base, so I don't see any obvious variables that would be taking large amounts of stack space.我不熟悉代码库,所以我看不到任何会占用大量堆栈空间的明显变量。 However, I notice that the functions (including the always_inline
d) are quite long.但是,我注意到函数(包括always_inline
d)很长。 Typically, in debug builds, every variable and temporary expression result is assigned a unique space in the stack frame, regardless of scope.通常,在调试版本中,无论 scope 是什么,都会在堆栈帧中为每个变量和临时表达式结果分配一个唯一的空间。 So even if 2 variables' lifetimes do not overlap (eg one is declared in the if
block, and another in the else
block) they will be allocated separate spaces in memory.因此,即使 2 个变量的生命周期不重叠(例如,一个在if
块中声明,另一个在else
块中声明),它们将在 memory 中分配单独的空间。 So this can add up even if there are a lot of small short-lived variables and temporary values.所以即使有很多小的短期变量和临时值,这也会累加。
You are probably best off disabling always_inline
attributes in all functions called by this function in debug builds, as this avoids pre-allocating memory for all possible branches of execution even if they are never taken, or if they are declared in a function that's not involved in the recursion. You are probably best off disabling always_inline
attributes in all functions called by this function in debug builds, as this avoids pre-allocating memory for all possible branches of execution even if they are never taken, or if they are declared in a function that's not involved在递归中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.