简体   繁体   中英

C++: Compiler and Linker functionality

I want to understand exactly which part of a program compiler looks at and which the linker looks at. So I wrote the following code:

#include <iostream>
using namespace std;
#include <string>

class Test {
private:
    int i;

public:
    Test(int val) {i=val ;}
    void DefinedCorrectFunction(int val);
    void DefinedIncorrectFunction(int val);
    void NonDefinedFunction(int val);

    template <class paramType>
    void  FunctionTemplate (paramType val) { i = val }
};

void Test::DefinedCorrectFunction(int val)
{
    i = val;
}

void Test::DefinedIncorrectFunction(int val)
{
    i = val
}

void main()
{
    Test testObject(1);
    //testObject.NonDefinedFunction(2);
    //testObject.FunctionTemplate<int>(2);

}

I have three functions:

  • DefinedCorrectFunction - This is a normal function declared and defined correctly.
  • DefinedIncorrectFunction - This function is declared correctly but the implementation is wrong (missing ;)
  • NonDefinedFunction - Only declaration. No definition.
  • FunctionTemplate - A function template.

    Now if I compile this code I get a compiler error for the missing ';'in DefinedIncorrectFunction.
    Suppose I fix this and then comment out testObject.NonDefinedFunction(2). Now I get a linker error. Now comment out testObject.FunctionTemplate(2). Now I get a compiler error for the missing ';'.

For function templates I understand that they are not touched by the compiler unless they are invoked in the code. So the missing ';' is not complained by the compiler until I called testObject.FunctionTemplate(2).

For the testObject.NonDefinedFunction(2), the compiler did not complain but the linker did. For my understanding, all compiler cared was to know that is a NonDefinedFunction function declared. It didn't care for the implementation. Then linker complained because it could not find the implementation. So far so good.

Where I get confused is when compiler complained about DefinedIncorrectFunction. It didn't look for implementation of NonDefinedFunction but it went through the DefinedIncorrectFunction.

So I'm little unclear as to what the compiler does exactly and what the linker does. My understanding is linker links components with their calls. So for when NonDefinedFunction is called it looked for the compiled implementation of NonDefinedFunction and complained. But compiler didn't care about the implementation of NonDefinedFunction but it did for DefinedIncorrectFunction.

I'd really appreciate if someone can explain this or provide some reference.

Thank you.

The function of the compiler is to compile the code that you have written and convert it into object files. So if you have missed a ; or used an undefined variable, the compiler will complain because these are syntax errors.

If the compilation proceeds without any hitch, the object files are produced. The object files have a complex structure but basically contain five things

  1. Headers - The information about the file
  2. Object Code - Code in machine language (This code cannot run by itself in most cases)
  3. Relocation Information - What portions of code will need to have addresses changed when the actual execution occurs
  4. Symbol Table - Symbols referenced by the code. They may be defined in this code, imported from other modules or defined by linker
  5. Debugging Info - Used by debuggers

The compiler compiles the code and fills the symbol table with every symbol it encounters. Symbols refers to both variables and functions. The answer to This question explains the symbol table.

This contains a collection of executable code and data that the linker can process into a working application or shared library. The object file has a data structure called a symbol table in it that maps the different items in the object file to names that the linker can understand.

The point to note

If you call a function from your code, the compiler doesn't put the final address of the routine in the object file. Instead, it puts a placeholder value into the code and adds a note that tells the linker to look up the reference in the various symbol tables from all the object files it's processing and stick the final location there.

The generated object files are processed by the linker that will fill out the blanks in symbol tables, link one module to the other and finally give the executable code which can be loaded by the loader.

So in your specific case -

  1. DefinedIncorrectFunction() - The compiler gets the definition of the function and begins compiling it to make the object code and insert appropriate reference into Symbol Table. Compilation fails due to syntax error, so Compiler aborts with an error.
  2. NonDefinedFunction() - The compiler gets the declaration but no definition so it adds an entry to symbol table and flags the linker to add appropriate values (Since linker will process a bunch of object files, it is possible this definitionis present in some other object file). In your case you do not specify any other file, so the linker aborts with an undefined reference to NonDefinedFunction error because it can't find the reference to the concerned symbol table entry.

To understand it further lets say your code is structured as following

File- try.h

#include<string>
#include<iostream>


class Test {
private:
    int i;

public:
    Test(int val) {i=val ;}
    void DefinedCorrectFunction(int val);
    void DefinedIncorrectFunction(int val);
    void NonDefinedFunction(int val);

    template <class paramType>
    void  FunctionTemplate (paramType val) { i = val; }
};

File try.cpp

#include "try.h"


void Test::DefinedCorrectFunction(int val)
{
    i = val;
}

void Test::DefinedIncorrectFunction(int val)
{
    i = val;
}

int main()
{

    Test testObject(1);
    testObject.NonDefinedFunction(2);
    //testObject.FunctionTemplate<int>(2);
    return 0;
}

Let us first only copile and assemble the code but not link it

$g++ -c try.cpp -o try.o
$

This step proceeds without any problem. So you have the object code in try.o. Let's try and link it up

$g++ try.o
try.o: In function `main':
try.cpp:(.text+0x52): undefined reference to `Test::NonDefinedFunction(int)'
collect2: ld returned 1 exit status

You forgot to define Test::NonDefinedFunction. Let's define it in a separate file.

File- try1.cpp

#include "try.h"

void Test::NonDefinedFunction(int val)
{
    i = val;
}

Let us compile it into object code

$ g++ -c try1.cpp -o try1.o
$

Again it is successful. Let us try to link only this file

$ g++ try1.o
/usr/lib/gcc/x86_64-redhat-linux/4.4.5/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
collect2: ld returned 1 exit status

No main so won';t link!!

Now you have two separate object codes that have all the components you need. Just pass BOTH of them to linker and let it do the rest

$ g++ try.o try1.o
$

No error!! This is because the linker finds definitions of all the functions (even though it is scattered in different object files) and fills the blanks in object codes with appropriate values

I believe this is your question:

Where I get confused is when compiler complained about DefinedIncorrectFunction. It didn't look for implementation of NonDefinedFunction but it went through the DefinedIncorrectFunction.

The compiler tried to parse DefinedIncorrectFunction (because you provided a definition in this source file) and there was a syntax error (missing semicolon). On the other hand, the compiler never saw a definition for NonDefinedFunction because there simply was no code in this module. You might have provided a definition of NonDefinedFunction in another source file, but the compiler doesn't know that. The compiler only looks at one source file (and its included header files) at a time.

Say you want to eat some soup, so you go to a restaurant.

You search the menu for soup. If you don't find it in the menu, you leave the restaurant. (kind of like a compiler complaining it couldn't find the function) If you find it, what do you do?

You call the waiter to go get you some soup. However, just because it's in the menu, doesn't mean that they also have it in the kitchen. Could be an outdated menu, it could be that someone forgot to tell the chef that he's supposed to make soup. So again, you leave. (like an error from the linker that it couldn't find the symbol)

Compiler checks that the source code is language conformant and adheres to the semantics of the language. The output from compiler is object code.

Linker links the different object modules together to form a exe. The definitions of functions are located in this phase and the appropriate code to call them is added in this phase.

The compiler compiles code in the form of translation units . It will compile all the code that is included in a source .cpp file,
DefinedIncorrectFunction() is defined in your source file, So compiler checks it for language validity.
NonDefinedFunction() does have any definition in the source file so the compiler does not need to compile it, if the definition is present in some other source file, the function will be compiled as a part of that translation unit and further the linker will link to it, if at linking stage the definition is not found by the linker then it will raise a linking error.

What the compiler does, and what the linker does, depends on the implementation: a legal implementation could just store the tokenized source in the “compiler”, and do everything in the linker. Modern implementations do put off more and more to the linker, for better optimization. And many early implementations of templates didn't even look the template code until link time, other than matching braces enough to know where the template ended. From a user point of view, you're more interested in whether the error “requires a diagnostic” (which can be emitted by the compiler or the linker) or is undefined behavior.

In the case of DefinedIncorrectFunction , you have provides source text which the implementation is required to parse. That text contains a error for which a diagnostic is required. In the case of NonDefinedFunction : if the function is used, failure to provide a definition (or providing more than one definition) in the complete program is a violation of the one definition rule, which is undefined behavior. No diagnostic is required (but I can't imagine an implementation that didn't provide one for a missing definition of a function that was used).

In practice, errors which can be easily detected simply by examining the text input of a single translation unit are defined by the standard to “require a diagnostic”, and will be detected by the compiler. Errors which cannot be detected by the examination of a single translation unit (eg a missing definition, which might be present in a different translation unit) are formally undefined behavior—in many cases, the errors can be detected by the linker, and in such cases, implementations will in fact emit an error.

This is somewhat modified in cases like inline functions, where you're allowed to repeat the definition in each translation unit, and extremely modified by templates, since many errors cannot be detected until instantiation. In the case of templates, the standard leaves implementations a great deal of freedom: at the least, the compiler must parse the template enough to determine where the template ends. The standard added things like typename , however, to allow much more parsing before instantiation. In dependent contexts, however, some errors cannot possibly be detected before instantiation, which may take place at compilation time or at link time—early implementations favored link time instantiation; compile time instantiation dominates today, and is used by VC++ and g++.

The missing semi-colon is a syntax error and therefore the code should not compile. This might happen even in a template implementation. Essentially, there is a parsing stage and whilst it is obvious to a human how to "fix and recover" a compiler doesn't have to do that. It can't just "imagine the semi-colon is there because that's what you meant" and continue.

A linker looks for function definitions to call where they are required. It isn't required here so there is no complaint. There is no error in this file as such, as even if it were required, it might not be implemented in this particular compilation unit. The linker is responsible for collecting together different compilation units, ie "linking" them.

Ah, but you could have NonDefinedFunction(int) in another compilation unit.

The compiler produces some output for the linker that basically says the following (among other things):

  • Which symbols (functions/variables/etc) are defined.
  • Which symbols are referenced but undefined. In this case the linker needs to resolve the references by searching through the other modules being linked. If it can't, you get a linker error.

The linker is there to link in code defined (possibly) in external modules - libraries or object files you will use together with this particular source file to generate the complete executable. So, if you have a declaration but no definition, your code will compile because the compiler knows the linker might find the missing code somewhere else and make it work. Therefore, in this case you will get an error from the linker, not the compiler.

If, on the other hand, there's a syntax error in your code, the compiler can't even compile and you will get an error at this stage. Macros and templates may behave a bit differently yet, not causing errors if they are not used (templates are about as much as macros with a somewhat nicer interface), but it also depends on the error's gravity. If you mess up so much that the compiler can't figure it out where the templated/macro code ends and regular code starts, it won't be able to compile.

With regular code, the compiler must compile even dead code (code not referenced in your source file) because someone might want to use that code from another source file, by linking your .o file to his code. Therefore non-templated/macro code must be syntactically correct even if it is not directly used in the same source file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM