简体   繁体   中英

C++ Matrix Multiplication Auto-Vectorization

I have auto-vectorization enabled. When I compile the code, I receive the following warning:

info C5002: loop not vectorized due to reason '1203'

MSDN specifies that the

Loop body includes non-contiguous accesses into an array.

I've look into these links, 1 , 2 , for help, but have had no luck.

Here is my source code:

for (int row = 0; row < size; ++row) {
    for (int col = 0; col < size; ++col) {
        float tmp = 0;
        for (int i = 0; i < size; ++i) { // This loop generates the warning above
            tmp += matrixA[row][i] * matrixB[i][col];
        }
        matrixResult[row][col] = tmp;
    }
}

Any help is welcomed.

2D arrays are stored as a single contiguous block of memory, so a 3x2 element 2D array is actually a 6 elements laid out end to end.

The [] indexing operators simply calculate which element to access.

So what's happening here is that matrixA is being accessed from element 1 through to element 6 sequentially (ie A1, A2, A3, B1, B2, B3).

matrixB however, is being accessed 'randomly', A1, B1, A2, B2 etc which maps onto the actual storage as accessing elements 1 then 4 then 2 then 5.

You can't change the order you access the elements of matrixB, but you could transpose it so that the elements are in the right order to be accessed sequentially. Obviously, if you only do this multiplication once, it might not be worth the effort to re-calculate matrixBs ordering, but if you are performing this calculation repeatedly, then the effort will be very much worth it.

If matrix A and B have the same storage order (eg row major), then you cannot vectorize it anyway. So that makes the warning plausible.

Just an advice here: if you want serious high performance computing then you should give up on 2D arrays. The gain in caching is way bigger than the vectorization speed up.

One way to reach contiguous access: you can swap the inner two loops. Instead of for row, for col, for i you have for row, for i, for col . See the resulted code bellow. Now the access of both matrixResult and matrixB is along col , so it is contiguous.

for (int row = 0; row < size; ++row) {
    for (int i = 0; i < size; ++i) {
        int a_row_i = matrixA[row][i];
        for (int col = 0; col < size; ++col) {
            matrixResult[row][col] += a_row_i * matrixB[i][col];
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM