简体   繁体   English

在C中进行类型转换后复制数组的更快方法?

[英]Faster method to copy arrays after typecasting in C?

I have a two-dimesional integer array InArray[2][60] carrying short data in 2 LS bytes and bit field data in 2 MS bytes. 我有一个二维整数数组InArray[2][60] ,其中包含2个LS字节的short数据和2个MS字节的位字段数据。 Please suggest a faster method to extract short data and copy it to a short OutArray[60] , something on the lines for memcpy() . 请提出一种更快的方法来提取short数据并将其复制到short OutArray[60] ,该方法适用于memcpy() I presume iterating through each item is not the most optimal method of doing this. 我认为遍历每个项目并不是实现此目的的最佳方法。 TIA TIA

EDIT : Adding code snippet 编辑:添加代码段

int InArray[2][60];
short OutArray[60];
for (int i=0; i < 60;i++)
{
    OutArray[i] = (short)(InArray[0][i] & 0xffff);
}

Is there a better and possibly faster way of doing this 有没有更好,可能更快的方法

If you really are copying a 60-element array, then it does not matter. 如果您确实要复制60个元素的数组,则没关系。

If the array is larger and/or you are doing it a lot of times, then you'll want to have a look at SIMD instruction sets: SSEx on Intel platforms, Altivec on PPC... 如果阵列更大并且/或者您要执行很多次,那么您将需要看一下SIMD指令集:Intel平台上的SSEx,PPC上的Altivec ...

For instance, using SSE4, you may use _mm_packus_epi32() which packs (and saturates) 2*4 32-bit operands into 8 16-bit operands. 例如,使用SSE4,您可以使用_mm_packus_epi32()将2 * 4个32位操作数打包(并饱和)为8个16位操作数。

Your compiler probably has intrinsics to use those: http://msdn.microsoft.com/en-us/library/hh977022.aspx , http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/PowerPC-AltiVec-Built_002din-Functions.html ... 您的编译器可能具有使用这些内在函数的内在函数: http : //msdn.microsoft.com/zh-cn/library/hh977022.aspx,http : //gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/PowerPC -AltiVec-Built_002din-Functions.html ...

This is only going to help if you're doing something like this many times. 仅当您多次执行此类操作时,这才有帮助。 I used Agner Fog's vectorclass to do this ( http://www.agner.org/optimize/vectorclass.zip ). 我使用了Agner Fog的vectorclass来做到这一点( http://www.agner.org/optimize/vectorclass.zip )。 This is a class to use SSE/AVX. 这是使用SSE / AVX的类。 But you'll find the best answer if you add the tags SSE and AVX to your question. 但是,如果在问题中添加标签SSE和AVX,则会找到最佳答案。

You'll also get better results if you can insure the arrays are 16 byte or 32 byte aligned. 如果可以确保数组是16字节或32字节对齐的,您还将获得更好的结果。 In the code below it would also help to make either the width of the arrays equal to 64 (even if you are only going to use 60 elements) or to make the length of the array a multiple of 64. 在下面的代码中,它也可以使数组的宽度等于64(即使仅使用60个元素),也可以使数组的长度为64的倍数。

#include <stdio.h>
#include "vectorclass.h"

void foo(int InArray[2][60],  short OutArray[60]) {
    for (int i=0; i < 60; i++) {
        OutArray[i] = (short)(InArray[0][i] & 0xffff);
    }
}

void foo_vec8s(int InArray[2][60],  short OutArray[60]) {
    int i=0;
    for (; i <(60-8); i+=8) {
        Vec8s v1 = Vec8s().load(&InArray[0][i]);
        Vec8s v2 = Vec8s().load(&InArray[0][i+4]);
        Vec8s out = blend8s<0,2,4,6,8,10,12,14>(v1,v2);
        out.store(&OutArray[i]);
    }
    //clean up since arrays are not a multiple of 64
    for (;i < 60; i++) {
        OutArray[i] = (short)(InArray[0][i] & 0xffff);
    }
}

int main() {
    int InArray[2][60];
    for(int i=0; i<60; i++) { 
        InArray[0][i] = i | 0xffff0000;
    }

    short OutArray1[60] = {0};
    foo(InArray, OutArray1);
    for(int i=0; i<60; i++) {
        printf("%d ", OutArray1[i]);
    } printf("\n");

    short OutArray2[60] = {0};
    foo_vec8s(InArray, OutArray2);
    for(int i=0; i<60; i++) {
        printf("%d ", OutArray2[i]);
    } printf("\n");  
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM