简体   繁体   中英

How to get SIMD code from C code

I am working on am/c Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz It supports SSE4.2.

I have written C code to perform XOR operation over string bits. But I want to write corresponding SIMD code and check for performance improvement. Here is my code

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

#define LENGTH 10

unsigned char xor_val[LENGTH];

void oper_xor(unsigned char *r1, unsigned char *r2)
{
    unsigned int i;
    for (i = 0; i < LENGTH; ++i)
    {
        xor_val[i] = (unsigned char)(r1[i] ^ r2[i]);
    printf("%d",xor_val[i]);
    }
}

int main() {

    int i;
    time_t start, stop;
    double cur_time;
    start = clock();
    oper_xor("1110001111", "0000110011");
    stop = clock();
    cur_time = ((double) stop-start) / CLOCKS_PER_SEC;

    printf("Time used %f seconds.\n", cur_time / 100);
    for (i = 0; i < LENGTH; ++i)
        printf("%d",xor_val[i]);
    printf("\n");
    return 0;
}

On compiling and running a sample code I am getting output shown below. Time is 00 here but in actual project it is consuming sufficient time.

gcc xor_scalar.c -o xor_scalar
pan88: ./xor_scalar
1110111100 Time used 0.000000 seconds.
1110111100

How can I start writing a corresponding SIMD code for SSE4.2

The Intel Compiler and any OpenMP compiler support #pragma simd and #pragma omp simd , respectively. These are your best bet to get the compiler to do SIMD codegen for you. If that fails, you can use intrinsics or, as a means of last resort, inline assembly.

Note the the printf function calls will almost certainly interfere with vectorization, so you should remove them from any loops in which you want to see SIMD.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM