I've just learnt simd programming using c++ and I have performed addition and subtraction quiet easily, but I find problem doing matrix multiplication.
When I compile it using : gcc -o auto-vector auto-vector.cpp -lstdc++
It's compiled,but when I try to run it, it says : Elapsed time: 3e-06 s * stack smashing detected * : terminated
It says stack smashing detected but it measures the elapsed time as well.
Is my code compiled?
//gcc -o auto-vector auto-vector.cpp -lstdc++
#include "xmmintrin.h"
#include <chrono> // for high_resolution_clock
#include <iostream>
int main()
{
float A[4][4] = {{16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}};
float B[4][4] = {{16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}};
float C[4][4] = {};
__m128 a_vec, b_vec, c_vec;
int N = 4;
// Record start time
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++)
{
for (int j = 0; j < N; j++)
{
c_vec = _mm_set1_ps(0);
for (int k = 0; k < N; k++)
{
a_vec = _mm_set1_ps(A[i][k]);
b_vec = _mm_loadu_ps(&B[k][j]);
c_vec = _mm_add_ps(_mm_mul_ps(a_vec, b_vec), c_vec);
}
_mm_storeu_ps(&C[i][j], c_vec);
}
}
// Record end time
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() << " s\n";
return 0;
}
That means you stored past the end of a stack array, and you got lucky that it was at the top of your stack frame so -fstack-protector-strong
could catch the bug for you instead of just stepping on other locals.
eg a 16-byte store to C[3][3]
touches C[3][3 .. 7]
.
You're auto-vectorizing over j
but you forgot to increment j
by 4 instead of 1, and to use j < N - 3
as your loop bound.
Also, if you want meaningful timing results, make sure to compile with gcc -O3
. Or preferably gcc -O3 -march=native -ffast-math -flto
and -fprofile-generate
/ -fprofile-use
.
Also, you're testing manual vectorization, but your comments says "auto-vector".
Coding style: declare your __m128
vars when your first use them. Like
__m128 c_vec = _mm_setzero_ps();
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.