简体   繁体   中英

Stack Smashing Detected in Matrix Multiplication C++ SIMD Programming

I've just learnt simd programming using c++ and I have performed addition and subtraction quiet easily, but I find problem doing matrix multiplication.

When I compile it using : gcc -o auto-vector auto-vector.cpp -lstdc++

It's compiled,but when I try to run it, it says : Elapsed time: 3e-06 s * stack smashing detected * : terminated

It says stack smashing detected but it measures the elapsed time as well.

Is my code compiled?

//gcc -o auto-vector auto-vector.cpp -lstdc++
#include "xmmintrin.h"
#include <chrono> // for high_resolution_clock
#include <iostream>

int main()
{
  float A[4][4] = {{16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}};
  float B[4][4] = {{16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}, {16, 2, 77, 40}};
  float C[4][4] = {};

  __m128 a_vec, b_vec, c_vec;
  int N = 4;

  // Record start time
  auto start = std::chrono::high_resolution_clock::now();

  for (int i = 0; i < N; i++)
  {
    for (int j = 0; j < N; j++)
    {
      c_vec = _mm_set1_ps(0);

      for (int k = 0; k < N; k++)
      {
        a_vec = _mm_set1_ps(A[i][k]);
        b_vec = _mm_loadu_ps(&B[k][j]);

        c_vec = _mm_add_ps(_mm_mul_ps(a_vec, b_vec), c_vec);
      }

      _mm_storeu_ps(&C[i][j], c_vec);
    }
  }

  // Record end time
  auto finish = std::chrono::high_resolution_clock::now();

  std::chrono::duration<double> elapsed = finish - start;

  std::cout << "Elapsed time: " << elapsed.count() << " s\n";

  return 0;
}

That means you stored past the end of a stack array, and you got lucky that it was at the top of your stack frame so -fstack-protector-strong could catch the bug for you instead of just stepping on other locals.

eg a 16-byte store to C[3][3] touches C[3][3 .. 7] .

You're auto-vectorizing over j but you forgot to increment j by 4 instead of 1, and to use j < N - 3 as your loop bound.


Also, if you want meaningful timing results, make sure to compile with gcc -O3 . Or preferably gcc -O3 -march=native -ffast-math -flto and -fprofile-generate / -fprofile-use .

Also, you're testing manual vectorization, but your comments says "auto-vector".


Coding style: declare your __m128 vars when your first use them. Like
__m128 c_vec = _mm_setzero_ps();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM