简体   繁体   English

库中的`const char *`存储的奇怪行为.s​​o文件

[英]Strange behaviour of `const char *` storage in libraries .so files

I have a library that I use in Android, though I am quite sure the problem is not specific to Android. 我有一个我在Android中使用的库,但我很确定这个问题不是Android特有的。 This library contains a bunch of error codes I print to logcat, and all of them consist of a constant string. 这个库包含一堆我打印到logcat的错误代码,它们都包含一个常量字符串。

...
if(...){ALOGE("Error in parameter XXXXXX");}
if(...){ALOGE("Error in parameter YYYYYY");}
if(...){ALOGE("Error in parameter ZZZZZZ");}
...

Today I noticed I have a big amount of data in my .rodata section (around 16kB). 今天我注意到我的.rodata部分有大量数据(大约16kB)。 So I run a strings mylib.so and I got a bunch of those strings. 所以我运行了一个strings mylib.so ,我得到了一堆字符串。

Error in parameter XXXXXX
Error in parameter YYYYYY
Error in parameter ZZZZZZ

I though, that with a small extra cost of printing (which should be fine, since these codes are rarely used), I could save a lot in space if I split the string in 2 parts. 但是,由于打印成本较低(这应该很好,因为很少使用这些代码),如果我将字符串分成两部分,我可以在空间中节省很多。 Then the compiler should do the job and group in a single string the common part. 然后编译器应该完成这项工作,并在一个字符串中组成公共部分。 Since the compiler have a duplicated string removal optimization step (CLANG and GCC). 由于编译器具有重复的字符串删除优化步骤(CLANG和GCC)。

I did it this way: (I have MANY of these, but they all have a pattern like this, I know I should use a define (but this was a quick test)) 我是这样做的:(我有很多这些,但他们都有这样的模式,我知道我应该使用一个定义(但这是一个快速测试))

...
if(...){ALOGE("Error in parameter %s","XXXXXX");}
if(...){ALOGE("Error in parameter %s","YYYYYY");}
if(...){ALOGE("Error in parameter %s","ZZZZZZ");}
...

What I found is that: 我发现的是:

  1. The library is EXACTLY the same size. 该库的大小完全相同。 .rodata is now much smaller, but .text increased by almost the same amount. .rodata现在要小得多,但.text增加了几乎相同的数量。 (a few bytes difference only) (仅限几个字节)
  2. strings command prints now 1 time only the "Error in parameter %s" string, and the separated parts. strings命令现在只打印1次"Error in parameter %s"字符串,以及分隔的部分。 So there is no string merging taking place. 所以没有字符串合并发生。
  3. Does not seem to matter if I Compile in 32bits, 64bits, etc.. 如果我在32位,64位等编译,似乎没有关系。

So, what is going on here? 那么,这里发生了什么? How can I fix? 我该怎么办? Any guidance? 任何指导? What is the compiler doing? 编译器在做什么? Thanks 谢谢

Extra data: 额外数据:

  • Compiler CLANG 4.9 (4.8 does same result). 编译器CLANG 4.9(4.8做同样的结果)。
  • Flags: -Os -fexceptions -std=c++11 -fvisivility=hidden 标志:-Os -fexceptions -std = c ++ 11 -fvisivility = hidden

EDIT: 编辑:

I created an online example test using GCC same results Online GCC 我使用GCC相同的结果在线GCC创建了一个在线示例测试

Split: 分裂:

#include <stdio.h>
int main()
{
    int a = rand()%7;
    switch(a){
        case 0: printf("Hello, %s!\n","Anna"); break;
        case 1: printf("Hello, %s!\n","Bob"); break;
        case 2: printf("Hello, %s!\n","Clark"); break;
        case 3: printf("Hello, %s!\n","Danniel"); break;
        case 4: printf("Hello, %s!\n","Edison"); break;
        case 5: printf("Hello, %s!\n","Foo"); break;
        case 6: printf("Hello, %s!\n","Garret"); break;
    }
    return 0;
}

NonSplit: NonSplit:

#include <stdio.h>
int main()
{
    int a = rand()%7;
    switch(a){
        case 0: printf("Hello, Anna!\n"); break;
        case 1: printf("Hello, Bob!\n"); break;
        case 2: printf("Hello, Clark!\n"); break;
        case 3: printf("Hello, Danniel!\n"); break;
        case 4: printf("Hello, Edison!\n"); break;
        case 5: printf("Hello, Foo!\n"); break;
        case 6: printf("Hello, Garret!\n"); break;
    }
    return 0;
}

Compiled with: 编译:

gcc -Os -o main main.c
gcc -Os -o main2 main2.c

Sizes: 尺寸:

-rwxr-xr-x 1 20446 20446 8560 Nov 16 11:43 main                     
-rw-r--r-- 1 20446 20446  478 Nov 16 11:41 main.c 
-rwxr-xr-x 1 20446 20446 8560 Nov 16 11:42 main2   
-rw-r--r-- 1 20446 20446  443 Nov 16 11:39 main2.c

Strings: 字符串:

    strings main2 | grep "Hello"                                  
Hello, Anna!                                                          
Hello, Bob!                                                          
Hello, Clark!                                                         
Hello, Danniel!                                                       
Hello, Edison!                                                       
Hello, Foo!                                                         
Hello, Garret!

    strings main | grep "Hello"                                  
Hello, %s!                                                          

All your expectations are fairly correct, but test cases are not sufficient to demonstrate the effect. 您的所有期望都是正确的,但测试用例不足以证明其效果。 First of all binary executable files have a notion of a "segment/section alignment" (or something like this). 首先,二进制可执行文件具有“段/段对齐”(或类似的东西)的概念。 In brief it means that first bytes of different sections can be placed only at file offsets that are a multiples of some value (eg decimal 512 ). 简而言之,它意味着不同部分的第一个字节只能放置在某个值的倍数的文件偏移处(例如十进制512 )。 Unused space between sections is filled with zeros to meet this requirement. 部分之间未使用的空间用零填充以满足此要求。 And all data that were provided by your test cases don't exhaust that padding and as result you can not feel real difference. 并且您的测试用例提供的所有数据都不会耗尽该填充,因此您无法感受到真正的差异。 Next - if you want to compare effect more clearly - you shouldn't link against startup code, ie you should build dynamic library with minimal number of references instead of regular executable. 接下来 - 如果你想更清楚地比较效果 - 你不应该链接启动代码,即你应该使用最少数量的引用而不是常规可执行文件来构建动态库。

Next, my test program. 接下来,我的测试程序。 It differs a bit from your one. 它与你的有点不同。 But not so conceptually. 但概念上并非如此。

#include <stdio.h>

#if defined(_SPLIT)
#define LOG(str) printf("Very very very loooo-o-o-o-o-o-o-ooooong prefix %s", str )
#elif defined(_NO_SPLIT)
#define LOG(str) printf("Very very very loooo-o-o-o-o-o-o-ooooong prefix " str )
#else
#error "Don't know what you want."
#endif

int foo(void) {
    LOG("aaaaaaaa");
    LOG("bbbbbbbb");
    LOG("cccccccc");
    LOG("dddddddd");
    LOG("eeeeeeee");
    LOG("ffffffff");
    LOG("gggggggg");
    LOG("hhhhhhhh");
    LOG("iiiiiiii");
    LOG("jjjjjjjj");
    LOG("kkkkkkkk");
    LOG("llllllll");
    LOG("mmmmmmmm");
    LOG("nnnnnnnn");
    LOG("oooooooo");
    LOG("pppppppp");
    LOG("qqqqqqqq");
    LOG("rrrrrrrr");
    LOG("ssssssss");
    LOG("tttttttt");
    LOG("uuuuuuuu");
    LOG("vvvvvvvv");
    LOG("wwwwwwww");
    LOG("xxxxxxxx");
    LOG("yyyyyyyy");
    LOG("zzzzzzzz");
    return 0;
}

Then, lets create dynamic libraries: 然后,让我们创建动态库:

$ gcc --shared -fPIC -o t_no_split.so -D_NO_SPLIT test.c
$ gcc --shared -fPIC -o t_split.so -D_SPLIT test.c

And compare sizes: 并比较尺寸:

-rwxr-xr-x  1 sysuser sysuser   12098 Nov 16 14:19 t_no_split.so
-rwxr-xr-x  1 sysuser sysuser    8002 Nov 16 14:19 t_split.so

IMO, there is really notable difference. IMO,确实存在显着差异。 And, being honest, I've not checked per-section sizes, but anyway you can do it by yourself. 而且,说实话,我没有检查每个部分的尺寸,但无论如何你可以自己做。

Of course it doesn't mean that not splitted string use 12098 - 8002 bytes more than splitted ones. 当然,这并不意味着不分割字符串比分割字符串多12098 - 8002字节。 It just means that compiler / linker is obliged to use more space for t_no_split.so than for t_split.so . 它只是意味着编译器/链接器必须为t_no_split.sot_split.so更多的空间。 And this bloating is definitely caused by the difference in string sizes. 而这种膨胀肯定是由字符串大小的差异引起的。 Another interesting thing - splits even neutralize small bloating of machine code caused by passing a second argument to printf() . 另一件有趣的事情 - 拆分甚至可以抵消因将第二个参数传递给printf()而导致的机器代码的小膨胀。

PS My machine is x64 Linux, GCC 4.8.4. PS我的机器是x64 Linux,GCC 4.8.4。

You are only saving 19 bytes per string, but at the cost of passing an additional argument to what looks like a varargs function. 每个字符串只保存19个字节,但代价是将其他参数传递给类似于varargs函数的内容。 At a minimum that is a load address and a push. 至少是一个加载地址和推送。

Let me guess, ALOGE is actually a macro? 让我猜一下, ALOGE实际上是一个宏?

I don't think you need a DEFINE - you need a function ( not inline) like: 我不认为你需要一个DEFINE - 你需要一个函数( 不是内联),如:

void BadParameter(const char * paramName)
{
    ALOGE("Error in parameter %s", paramName);
}

... and replace all the calls with that. ...并用它替换所有的电话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM