简体   繁体   English

如何从C代码获取SIMD代码

[英]How to get SIMD code from C code

I am working on am/c Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz It supports SSE4.2. 我正在使用am / c Intel(R) Xeon(R) CPU E5-2640 v2 @ 2.00GHz它支持SSE4.2。

I have written C code to perform XOR operation over string bits. 我写的C代码在串位进行异或运算。 But I want to write corresponding SIMD code and check for performance improvement. 但是我想编写相应的SIMD代码并检查性能是否提高。 Here is my code 这是我的代码

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

#define LENGTH 10

unsigned char xor_val[LENGTH];

void oper_xor(unsigned char *r1, unsigned char *r2)
{
    unsigned int i;
    for (i = 0; i < LENGTH; ++i)
    {
        xor_val[i] = (unsigned char)(r1[i] ^ r2[i]);
    printf("%d",xor_val[i]);
    }
}

int main() {

    int i;
    time_t start, stop;
    double cur_time;
    start = clock();
    oper_xor("1110001111", "0000110011");
    stop = clock();
    cur_time = ((double) stop-start) / CLOCKS_PER_SEC;

    printf("Time used %f seconds.\n", cur_time / 100);
    for (i = 0; i < LENGTH; ++i)
        printf("%d",xor_val[i]);
    printf("\n");
    return 0;
}

On compiling and running a sample code I am getting output shown below. 在编译和运行示例代码时,我得到如下所示的输出。 Time is 00 here but in actual project it is consuming sufficient time. 时间是00在这里,但实际工程中,消耗足够的时间。

gcc xor_scalar.c -o xor_scalar
pan88: ./xor_scalar
1110111100 Time used 0.000000 seconds.
1110111100

How can I start writing a corresponding SIMD code for SSE4.2 如何开始为SSE4.2编写相应的SIMD代码

The Intel Compiler and any OpenMP compiler support #pragma simd and #pragma omp simd , respectively. 英特尔编译器和任何OpenMP编译器分别支持#pragma simd#pragma omp simd These are your best bet to get the compiler to do SIMD codegen for you. 这些是让编译器为您完成SIMD代码生成的最佳选择。 If that fails, you can use intrinsics or, as a means of last resort, inline assembly. 如果失败,则可以使用内部函数,或者作为最后的手段,使用内联汇编。

Note the the printf function calls will almost certainly interfere with vectorization, so you should remove them from any loops in which you want to see SIMD. 请注意, printf函数调用几乎肯定会干扰矢量化,因此您应将它们从要查看SIMD的任何循环中删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM