简体   繁体   中英

Avoid multiple definition linker error when not using the redefined symbols

I try to build an executable that links to various shared and static libraries. It turns out that two of the static libraries both define the same symbol, which results in a multiple definition linker error. My executable doesn't use this symbol so it's not really a concern.

I can avoid the error by adding the --allow-multiple-definitions flag but that seems like a nuclear option. I would like the linker to complain if I try to use a multiple-time defined symbol.

Is there a way to tell the linker "complain for multiple definitions only if the symbol is used"? Or alternatively tell it, "from lib ABC ignore symbol XYZ". I am developing with g++ on linux.

You may have a one variant of the problem or a different variant, depending on facts whose relevance you haven't yet considered. Or possibly you have a mixture of both, so I'll walk through a solution to each variant.

You should be familiar with the nature of static libraries and how they are consumed in linkage, as summarised here

The Superflous Globals Symbols Variant

Here are a couple of source files and a header file:

one.cpp

#include <onetwo.h>

int clash = 1;

int get_one()
{
    return clash;
}

two.cpp

#include <onetwo.h>

int get_two()
{
    return 2;
}

onetwo.h

#pragma once

extern int get_one();
extern int get_two();

These have been built into a static library libonetwo.a

$ g++ -Wall -Wextra -pedantic -I. -c one.cpp two.cpp
$ ar rcs libonetwo.a one.o two.o

whose intended API is defined in onetwo.h

Simarily, some other source files and a header have been built into a static libary libfourfive.a whose intended API is defined in fourfive.h

four.cpp

#include <fourfive.h>

int clash = 4;

int get_four()
{
    return clash;
}

five.cpp

#include <fourfive.h>

int get_five()
{
    return 5;
}

fourfive.h

#pragma once

extern int get_four();
extern int get_five();

And here's the source of a program that depends on both libraries:

prog.cpp

#include <onetwo.h>
#include <fourfive.h>

int main()
{
    return get_one() + get_four();
}

which we try to build like so:

$ g++ -Wall -Wextra -pedantic -I. -c prog.cpp
$ g++ -o prog prog.o -L. -lonetwo -lfourfive
/usr/bin/ld: ./libfourfive.a(four.o):(.data+0x0): multiple definition of `clash'; ./libonetwo.a(one.o):(.data+0x0): first defined here
collect2: error: ld returned 1 exit status

encountering a name-collision for the symbol clash , because it is globally defined in two of the object files that the linkage requires, one.o and four.o :

$ readelf -s libonetwo.a libfourfive.a | egrep '(File|Symbol|OBJECT|FUNC)'
File: libonetwo.a(one.o)
Symbol table '.symtab' contains 11 entries:
     9: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 clash
    10: 0000000000000000    16 FUNC    GLOBAL DEFAULT    1 _Z7get_onev
File: libonetwo.a(two.o)
Symbol table '.symtab' contains 10 entries:
     9: 0000000000000000    15 FUNC    GLOBAL DEFAULT    1 _Z7get_twov
File: libfourfive.a(four.o)
Symbol table '.symtab' contains 11 entries:
     9: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 clash
    10: 0000000000000000    16 FUNC    GLOBAL DEFAULT    1 _Z8get_fourv
File: libfourfive.a(five.o)
Symbol table '.symtab' contains 10 entries:
     9: 0000000000000000    15 FUNC    GLOBAL DEFAULT    1 _Z8get_fivev

The problem symbol clash is not referenced in our own code, prog.(cpp|o) . You wondered:

Is there a way to tell the linker "complain for multiple definitions only if the symbol is used"?

No there isn't, but that's immaterial. one.o would not have been extracted from libonetwo.a and linked into the program if the linker didn't need it to resolve some symbol. It needed it to resolve get_one . Likewise it only linked four.o because it's needed to resolve get_four . So the colliding definitions of clash are in the linkage. And although prog.o doesn't use clash , it does use get_one , which uses clash and which intends to use the defintion of clash in one.o . Likewise prog.o uses get_four , which uses clash and intends to use the different definition in four.o .

Even if clash was unused by each libary as well as the program, the fact that it is defined in multiple object files that must be linked into the program means that the program will contain multiple definitions of it, and only --allow-multiple-definitions will allow that.

In that light you'll also see that:

Or alternatively [is there a way to] tell it, "from lib ABC ignore symbol XYZ".

in general won't fly. If we could tell the linker to ignore (say) the definition of clash in four.o and resolve the symbol everywhere to the definition in one.o (the only other candidate) then get_four() would return 1 instead of 4 in our program. That is in fact the effect of --allow-multiple-definitions , since it causes the first definition in the linkage to be used.

By inspection of the source code of libonetwo.a (or libfourfive.a ) we can fairly confidently spot the root cause of the problem. The symbol clash has been left with external linkage where it only needed internal linkage, since it isn't declared in the associated header file and is referenced nowhere in the libary except in the file where it's defined. The offending source files should have written:

one_good.cpp

#include <onetwo.h>

namespace {
    int clash = 1;
}

int get_one()
{
    return clash;
}

four_good.cpp

#include <fourfive.h>

namespace {
    int clash = 4;
}

int get_four()
{
    return clash;
}

and all would be good:

$ g++ -Wall -Wextra -pedantic -I. -c one_good.cpp four_good.cpp
$ readelf -s one_good.o four_good.o | egrep '(File|Symbol|OBJECT|FUNC)'
File: one_good.o
Symbol table '.symtab' contains 11 entries:
     5: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    3 _ZN12_GLOBAL__N_15clashE
    10: 0000000000000000    16 FUNC    GLOBAL DEFAULT    1 _Z7get_onev
File: four_good.o
Symbol table '.symtab' contains 11 entries:
     5: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    3 _ZN12_GLOBAL__N_15clashE
    10: 0000000000000000    16 FUNC    GLOBAL DEFAULT    1 _Z8get_fourv

$ g++ -o prog prog.o one_good.o four_good.o
$./prog; echo $?
5

Since re-writing the source code like that is not a option, we have to modify the object files to the same effect. The tool for this is objcopy .

$ objcopy --localize-symbol=clash libonetwo.a libonetwo_good.a

This command has the same effect as running:

$ objcopy --localize-symbol=clash orig.o fixed.o

on each of the object files libonetwo(orig.o) to output a fixed object file fixed.o , and archiving all the fixed.o files in a new static library libonetwo_good.a . And the effect of --localize-symbol=clash , on each object file, is to change the linkage of the symbol clash , if defined, from external ( GLOBAL ) to internal ( LOCAL) :

$ readelf -s libonetwo_good.a | egrep '(File|Symbol|OBJECT|FUNC)'
File: libonetwo_good.a(one.o)
Symbol table '.symtab' contains 11 entries:
     9: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    3 clash
    10: 0000000000000000    16 FUNC    GLOBAL DEFAULT    1 _Z7get_onev
File: libonetwo_good.a(two.o)
Symbol table '.symtab' contains 10 entries:

Now the linker cannot see the LOCAL definition of clash in libonetwo_good.a(one.o) .

That's sufficient to head off the multiple definition error, but since libfourfive.a has the same defect, we'll fix it too:

$ objcopy --localize-symbol=clash libfourfive.a libfourfive_good.a

And then we can relink prog successfully, using the fixed libraries.

$ g++ -o prog prog.o -L. -lonetwo_good -lfourfive_good
$ ./prog; echo $?
5

The Global Symbols Deadlock Variant

In this scenario, the sources and headers for libonetwo.a are:

one.cpp

#include <onetwo.h>
#include "priv_onetwo.h"

int inc_one()
{
    return inc(clash);
}

two.cpp

#include <onetwo.h>
#include "priv_onetwo.h"

int inc_two()
{
    return inc(clash + 1);
}

priv_onetwo.cpp

#include "priv_onetwo.h"

int clash = 1;

int inc(int i)
{
    return i + 1;
}

priv_onetwo.h

#pragma once

extern int clash;
extern int inc(int);

onetwo.h

#pragma once

extern int inc_one();
extern int inc_two();

And for libfourfive.a they are:

four.cpp

#include <fourfive.h>
#include "priv_fourfive.h"

int dec_four()
{
    return dec(clash);
}

five.cpp

#include <fourfive.h>
#include "priv_fourfive.h"

int dec_five()
{
    return dec(clash + 1);
}

priv_fourfive.cpp

#include "priv_fourfive.h"

int clash = 4;

int dec(int i)
{
    return i - 1;
}

priv_fourfive.h

#pragma once

extern int clash;
extern int dec(int);

fourfive.h

#pragma once

extern int dec_four();
extern int dec_five();

Each of these libraries is built with some common internals defined in a source file - ( priv_onetwo.cpp | priv_fourfive.cpp ) - and these internals are globally declared for building the library through a private header - ( priv_onetwo.h | priv_fourfive.h ) - that is not distributed with the library. They are undocumented symbols but nevertheless exposed to the linker.

Now there are two files in each library make that undefined ( UND ) references to the global symbol clash , which is defined in another file:

$ readelf -s libonetwo.a libfourfive.a | egrep '(File|Symbol|OBJECT|FUNC|clash)'
File: libonetwo.a(one.o)
Symbol table '.symtab' contains 13 entries:
     9: 0000000000000000    23 FUNC    GLOBAL DEFAULT    1 _Z7inc_onev
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND clash
File: libonetwo.a(two.o)
Symbol table '.symtab' contains 13 entries:
     9: 0000000000000000    26 FUNC    GLOBAL DEFAULT    1 _Z7inc_twov
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND clash
File: libonetwo.a(priv_onetwo.o)
Symbol table '.symtab' contains 11 entries:
     9: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 clash
    10: 0000000000000000    19 FUNC    GLOBAL DEFAULT    1 _Z3inci
File: libfourfive.a(four.o)
Symbol table '.symtab' contains 13 entries:
     9: 0000000000000000    23 FUNC    GLOBAL DEFAULT    1 _Z8dec_fourv
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND clash
File: libfourfive.a(five.o)
Symbol table '.symtab' contains 13 entries:
     9: 0000000000000000    26 FUNC    GLOBAL DEFAULT    1 _Z8dec_fivev
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND clash
File: libfourfive.a(priv_fourfive.o)
Symbol table '.symtab' contains 11 entries:
     9: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 clash
    10: 0000000000000000    19 FUNC    GLOBAL DEFAULT    1 _Z3deci

Our program source this time is:

prog.cpp

#include <onetwo.h>
#include <fourfive.h>

int main()
{
    return inc_one() + dec_four();
}

and:

$ g++ -Wall -Wextra -pedantic -I. -c prog.cpp
$ g++ -o prog prog.o -L. -lonetwo -lfourfive
/usr/bin/ld: ./libfourfive.a(priv_fourfive.o):(.data+0x0): multiple definition of `clash'; ./libonetwo.a(priv_onetwo.o):(.data+0x0): first defined here
collect2: error: ld returned 1 exit status

once again clash is multiply defined. To resolve inc_one in main , the linker needed one.o , which obliged it to resolve inc , which made it need priv_onetwo.o , which contains the first definition of clash . To resolve dec_four in main , the linker needed four.o , which obliged it to resolve dec , which made it need priv_fourfive.o , which contains a rival definition of clash .

In this scenario, it isn't a coding error in either library that clash has external linkage. It needs to have external linkage. Localizing the definition of clash with objcopy in either of libonetwo.a(priv_onetwo.o) or libfourfive.a(priv_fourfive.o) will not work. If we do that the linkage will succeed but output a bugged program, because the linker will resolve clash to the one surviving GLOBAL definition from the other object file: then dec_four() will return 0 instead of 3 in the program, dec_five() will return 1 not 4; or else inc_one() will return 5 and inc_two() will return 6. And if we localize both definitions then no definition of clash will be found in the linkage of prog to satisfy the references in one.o or four.o , and it will fail for undefined reference to clash

This time objcopy comes to the rescue again, but with a different option 1 :

$ objcopy --redefine-sym clash=clash_onetwo libonetwo.a libonetwo_good.a

The effect of this command is to create a new static library libonetwo_good.a , containing new object files that are pairwise the same as those in libonetwo.a , except that the symbol clash has been everywhere replaced with clash_onetwo :

$ readelf -s libonetwo_good.a | egrep '(File|Symbol|clash)'
File: libonetwo_good.a(one.o)
Symbol table '.symtab' contains 13 entries:
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND clash_onetwo
File: libonetwo_good.a(two.o)
Symbol table '.symtab' contains 13 entries:
    10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND clash_onetwo
File: libonetwo_good.a(priv_onetwo.o)
Symbol table '.symtab' contains 11 entries:
     9: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    2 clash_onetwo

We'll do the corresponding thing with libfourfive.a :

$ objcopy --redefine-sym clash=clash_fourfive libfourfive.a libfourfive_good.a

Now we're good to go once more:

$ g++ -o prog prog.o -L. -lonetwo_good -lfourfive_good
$ ./prog; echo $?
5

Of the two solutions, use the fix for The Superflous Globals Symbols Variant if superflous globals is what you've got, although the fix for the The Global Symbols Deadlock Variant would also work. It is never desirable to tamper with object files between compilation and linkage; it can only be unavoidable or the lesser of evils. But if you're going to tamper with them, localizing a global symbol that should never have been global is a more transparent tampering than changing the name of a symbol to one that has no origin in source code.


[1] Don't forget that if you want to use objcopy with any option argument that is a symbol in a C++ object file, you have to use the mangled name of the C++ identifier than maps to the symbol. In this demo code it happens that the mangled name of the C++ identifier clash is also clash . But if, eg the fully qualified identfier had been onetwo::clash , its mangled name would be _ZN6onetwo5clashE , as reported by nm or readelf . Conversely of course if you wished to use objcopy to change _ZN6onetwo5clashE in an object file to a symbol that will demangle as onetwo::klash , then that symbol will be _ZN6onetwo5klashE .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM