简体   繁体   中英

How do I create a static library with GCC compiler optimization?

I have a function to compute dot product in C++. I want to compile this function with -O3 compiler optimization. Rest of the codes in my codebase are compiled with -O0. To do this, I have created a static library that contains the function and compiled the library with -O3. Then I have linked the library to my code. But I am not getting the optimization from my library.

test.cpp

#include "config.h"
int multiply(uint128 *X1, uint128 *Y1, uint128 &ans, int input_length)
{
    int i=0;
    ans = 0;
    if (input_length > 4)
    {
        for (; i < input_length - 4; i += 4)
        {
            ans += X1[i] * Y1[i];
            ans += X1[i + 1] * Y1[i + 1];
            ans += X1[i + 2] * Y1[i + 2];
            ans += X1[i + 3] * Y1[i + 3];
        }
    }
    for (; i < input_length; i++)
    {
        ans += X1[i] * Y1[i];
    }    
    return 0;
}
int main()
{
    int len = 500, wrapper = 50;
    uint128 a[len], b[len], ans;
    auto start = time_now, end = time_now;
    long long ctr = 0;
    for(int t = 0; t < wrapper; t++)
    {
        for(int i =0; i < len; i++)
        {
            a[i] = rand();
            b[i] = rand();
        }
        start = time_now;
        multiply(a, b, ans, len);
        end = time_now;
        ctr += std::chrono::duration_cast<std::chrono::nanoseconds>(end-start).count();
    }
    cout<<"time taken: "<<ctr<<endl;
}
Compilation:
g++ -O3 test.cpp -std=c++11
./a.out
time taken: 1372

optimized.hpp

#ifndef OPTIMIZED_HPP
#define OPTIMIZED_HPP
#include <bits/stdc++.h>
using namespace std;
typedef __uint128_t uint128;


int multiply(uint128 *X1, uint128 *Y1, uint128 &ans, int input_length);
#endif

main.cpp

#include "optimized.hpp"
typedef __uint128_t uint128;

#define time_now std::chrono::high_resolution_clock::now()

int main()
{

    int len = 500, wrapper = 50;
    uint128 a[len], b[len], ans;

    auto start = time_now, end = time_now;
    long long ctr = 0;

    for(int t = 0; t < wrapper; t++)
    {
        for(int i =0; i < len; i++)
        {
            a[i] = rand();
            b[i] = rand();
        }

        start = time_now;
        multiply(a, b, ans, len);
        end = time_now;
        ctr += std::chrono::duration_cast<std::chrono::nanoseconds>(end-start).count();
    }
    cout<<"time taken: "<<ctr<<endl;

    return 0;
}
Compilation:
(the name of the library file is optimized.cpp)
g++ -O3 -g -std=c++11 -c optimized.cpp
ar rcs libfast.a optimized.o
g++ main.cpp libfast.a -std=c++11 -O3
./a.out 
time taken: 36140

I'm afraid you've perpetrated the class foot-shoot: I optimised away the function I was trying to time . We all did it once.

You've optimised the test.cpp program -O3 . The compiler observes that the program ignores the return value of multiply . It also has the definition of multiply to hand and observes that the body has no external effects but the ignored return value (in any case constant 0 ) and ignored reference parameter ans . So the line:

multiply(a, b, ans, len);

might as well be:

(void)0;

and the optimiser culls it.

You can rectify this by amending the program so that it uses external effects of multiply in a way that affects the output of the program and can't be predicted just by knowing the definition of the function.

But that's not enough. -O3 optimisation of test.cpp when the compiler can see the definition of multiply is still going to have a major information advantage over the same optimisation of main.cpp , where multiply is just an external reference to a black box.

To measure meaningfully the speed of multiply with -O3 optimisation against its speed with -O0 optimisation you must measure in each case with other things being equal . So at least do this:

(Inadvertently my test.cpp is your main.cpp )

$ g++ -Wall -Wextra -pedantic -c -O0 -o mult_O0.o optimized.cpp
$ g++ -Wall -Wextra -pedantic -c -O3 -o mult_O3.o optimized.cpp
$ g++ -Wall -Wextra -pedantic -c -O3 -o test.o test.cpp
test.cpp: In function ‘int main()’:
test.cpp:13:13: warning: ISO C++ forbids variable length array ‘a’ [-Wvla]
   13 |     uint128 a[len], b[len], ans;
      |             ^
test.cpp:13:21: warning: ISO C++ forbids variable length array ‘b’ [-Wvla]
   13 |     uint128 a[len], b[len], ans;
      |                     ^

(You might care about those warnings)

$ g++ -o tmult_O0 test.o mult_O0.o
$ g++ -o tmult_O3 test.o mult_O3.o
$ ./tmult_O0
time taken: 228461
$ ./tmult_O3
time taken: 99092

which shows that -O3 is doing its stuff.

And if you do it like that, you don't need to make sure you use the effects of multiply in test.cpp , because now compiler knows that it cannot know the effects of the black box multiply(a, b, ans, len) and cannot cull it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM