简体   繁体   中英

How to prevent all symbols from static library to load and why other symbols from same .o file get exported to test while linking static library

Suppose there are three c files, say ac contains functions xx() , yy() and bc contains nn() , mm() and cc contains qq() , rr() .

I made a static library stat.a out of ao , bo and co . If I link stat.a into a test which calls xx() , then symbol yy() also gets exported: nm test has both symbols xx and yy .

  1. I would like to know why the symbols qq and rr do not get exported ?
  2. Is there any method to prevent any other symbols than xx being loaded?
  1. I would like to know why the symbols qq and rr do not get exported ?

You have to inform the linker of your intention How to force gcc to link an unused static library

gcc -L./ -o test test.c -Wl,--whole-archive stat.a -Wl,--no-whole-archive

  1. Is there any method to prevent any other symbols than xx being loaded?

From How do I include only used symbols when statically linking with gcc?

gcc -ffunction-sections -c ac

gcc -L./ -o test test.c -Wl,--gc-sections stat.a

Here is an implementation of your scenario:

ac

#include <stdio.h>

void xx(void)
{
    puts(__func__);
}

void yy(void)
{
    puts(__func__);
}

bc

#include <stdio.h>

void nn(void)
{
    puts(__func__);
}

void mm(void)
{
    puts(__func__);
}

cc

#include <stdio.h>

void qq(void)
{
    puts(__func__);
}

void rr(void)
{
    puts(__func__);
}

test.c

extern void xx(void);

int main(void)
{
    xx();
    return 0;
}

Compile all the *.c files to *.o files:

$ gcc -Wall -c a.c b.c c.c test.c

Make a static library stat.a , containing ao , bo , co :

$ ar rcs stat.a a.o b.o c.o

Link program test , inputting test.o and stat.a :

$ gcc -o test test.o stat.a

Run:

$ ./test
xx

Let's see the symbol tables of the object files in stat.a :

$ nm stat.a

a.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
                 U _GLOBAL_OFFSET_TABLE_
                 U puts
0000000000000000 T xx
0000000000000013 T yy

b.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
                 U _GLOBAL_OFFSET_TABLE_
0000000000000013 T mm
0000000000000000 T nn
                 U puts

c.o:
0000000000000000 r __func__.2250
0000000000000003 r __func__.2254
                 U _GLOBAL_OFFSET_TABLE_
                 U puts
0000000000000000 T qq
0000000000000013 T rr

The definitions ( T ) of xx , yy are in member stat.a(ao) . Definitions of nn , mm are in stat.a(bo) . Definitions of qq , rr are in stat.a(co) .

Let's see which of those symbols are also defined in the symbol table of the program test :

$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx
000000000000065d T yy

xx , which is called in the program, is defined. yy , which is not called, is also defined. nn , mm , qq and rr , none of which are called, are all absent.

That's what you've observed.

I would like to know why the symbols qq and rr do not get exported?

What is a static library , such as stat.a , and what is its special role in a linkage?

It is an ar archive that conventionally - but not necessarily - contains nothing but object files. You can offer such an archive to the linker from which to select the object files it needs , if any, to carry on the linkage. The linker needs those object files in the archive that provide definitions for symbols that have been referenced, but not yet defined, in input files it has already linked. The linker extracts the needed object files from the archive and inputs them to the linkage, exactly as if they were individually named input files and the static library was not mentioned at all.

So what the linker does with an input static library is different from what it does with an input object file . Any input object file is linked into the output file unconditionally (whether it is needed or not).

In this light, let's redo the linkage of test with some diagnostics ( -trace) to show what files are actually linked:

$ gcc -o test test.o stat.a -Wl,--trace
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o

Apart from all the boiler-plate files for a C program linkage that gcc adds by default, the only files of ours in the linkage are the two object files:

test.o
(stat.a)a.o

The linkage:

$ gcc -o test test.o stat.a

is exactly the same as the linkage:

$ gcc -o test test.o a.o

Let's think that through.

  • test.o was the first linker input. This object file was linked unconditionally into the program.
  • test.o contains a reference (specifically, a function call) to xx but no definition of the function xx .
  • So the linker now needs to find a definition of xx to complete the linkage.
  • The next linker input is the static library stat.a .
  • The linker searches stat.a for an object file that contains a defintion of xx .
  • It finds ao . It extracts ao from the archive and links it into the program.
  • There are no other unresolved symbol references in the linkage for which the linker can find definitions in stat.a(bo) or stat(co) . So neither of those object files is extracted and linked.

By extracting an linking (just) stat.a(ao) the linker has got a definition of xx that it needed to resolved the function call in test.o . But ao also contains the definition of yy . So that definition is also linked into the program. nn , mm , qq and rr are not defined in the program because none of them are defined in the object files that were linked into the program.

That's the answer to your first question. Your second is:

Is there any method to prevent any other symbols than xx being loaded?

There are at least two ways.

One is simply to define each of xx , yy , nn , mm , qq , rr in a source file by itself . Then compile object files xx.o , yy.o , nn.o , mm.o , qq.o , rr.o and archive all of them in stat.a . Then, if the linker ever needs to find an object file in stat.a that defines xx , it will find xx.o , extract and link it, and the definition of xx alone will be added to linkage.

There's another way that does not require you code just one function in each source file. This way depends on the fact that an ELF object file, as produced by the compiler, is composed of various sections and these sections are in fact the units that the linker distinguishes and merges together into the output file. By default, there is a standard ELF section for each kind of symbol. The compiler places all of the function definitions in one code section and all data definitions in an appropriate data section. The reason that your linkage of program test contains the definitions of both xx and yy is that the compiler has placed both of these definitions in the single code section of ao , so the linker can either merge that code section into the program, or not: it can only link the definitions of xx and yy , or neither of them, so it is obliged to link both, even though only xx is needed. Let's see the disassembly of the code section of ao . By default the code section is is called .text :

$ objdump -d a.o

a.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <xx>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # b <xx+0xb>
   b:   e8 00 00 00 00          callq  10 <xx+0x10>
  10:   90                      nop
  11:   5d                      pop    %rbp
  12:   c3                      retq

0000000000000013 <yy>:
  13:   55                      push   %rbp
  14:   48 89 e5                mov    %rsp,%rbp
  17:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 1e <yy+0xb>
  1e:   e8 00 00 00 00          callq  23 <yy+0x10>
  23:   90                      nop
  24:   5d                      pop    %rbp
  25:   c3                      retq

There you see the definitions of xx and yy , both in the .text section.

But you can ask the compiler to place the definition of each global symbol in its own section in the object file. Then the linker can seperate the code section for any function definition from any other, and you can ask the linker to throw away any sections that aren't used in the output file. Let's try that.

Compile all the source files again, this time asking for a separate section per symbol:

$ gcc -Wall -ffunction-sections -fdata-sections -c a.c b.c c.c test.c

Now look again at the disassembly of ao :

$ objdump -d a.o

a.o:     file format elf64-x86-64


Disassembly of section .text.xx:

0000000000000000 <xx>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # b <xx+0xb>
   b:   e8 00 00 00 00          callq  10 <xx+0x10>
  10:   90                      nop
  11:   5d                      pop    %rbp
  12:   c3                      retq

Disassembly of section .text.yy:

0000000000000000 <yy>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # b <yy+0xb>
   b:   e8 00 00 00 00          callq  10 <yy+0x10>
  10:   90                      nop
  11:   5d                      pop    %rbp
  12:   c3                      retq

Now we've got two code sections in ao : .text.xx , containing only the definition of xx , and .text.yy , containing only the definition of yy . The linker can merge either of these sections into a program and not merge the other.

Rebuild stat.a

$ rm stat.a
$ ar rcs stat.a a.o b.o c.o

Relink the program, this time asking the linker to discard unused input sections ( -gc-sections ). We'll also ask it to trace the files it loads ( -trace ) and to print a mapfile for us ( -Map=mapfile ):

$ gcc -o test test.o stat.a -Wl,-gc-sections,-trace,-Map=mapfile
/usr/bin/x86_64-linux-gnu-ld: mode elf_x86_64
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o
/usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o
test.o
(stat.a)a.o
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/lib/x86_64-linux-gnu/libc.so.6
(/usr/lib/x86_64-linux-gnu/libc_nonshared.a)elf-init.oS
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
libgcc_s.so.1 (/usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1)
/usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o
/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o

The -trace output is exactly the same as before. But check again which of our symbols are defined in the program:

$ nm test | egrep 'T (xx|yy|qq|rr|nn|mm)'
000000000000064a T xx

Only xx , which is what you want.

The output of the program is the same as before:

$ ./test
xx

Finally look at the mapfile. Near the top you see:

mapfile

...
Discarded input sections
...
...
 .text.yy       0x0000000000000000       0x13 stat.a(a.o)
...
...

The linker was able to throw away the redundant code section .text.yy from the input file stat.a(ao) . That's why the redundant definition of yy is no longer in the program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM