将无符号短数组(每个值16位)转换为无符号整数数组(每个值32位)的最有效方法是什么?
Copy it.
unsigned short source[]; // …
unsigned int target[]; // …
unsigned short* const end = source + sizeof source / sizeof source[0];
std::copy(source, end, target);
std::copy
internally choses the best copying mechanism for given input types. In this case, however, there's probably no better way than copying the elements individually in a loop.
Use std::copy
in C++:
#include<algorithm> //must include
unsigned short ushorts[M]; //where M is some const +ve integer
unsigned int uints[N]; //where N >= M
//...fill ushorts
std::copy(ushorts, ushorts+M, uints);
And in C, use manual loop (in fact, you can use manual loop both in C and C++):
int i = 0;
while( i < M ) { uints[i] = ushorts[i]; ++i; }
Here an unrolled loop accessing in 64 bits chunks. It might be a little bit faster than the simple loop, but testing is the only way to know.
Assuming that N is a multiple of four, that sizeof (short) is 16 bit and working with 64 bit registers works.
typedef union u {
uint16_t us[4];
uint32_t ui[2];
uint64_t ull;
} u_t;
ushort_t src[N] = ...;
uint_t dst[N];
u_t *p_src = (u_t *) src;
u_t *p_dst = (u_t *) dst;
uint_t i;
u_t tmp, tmp2;
for(i=0; i<N/4; i++) {
tmp = p_src[i]; /* Read four shorts in one read access */
tmp2.ui[0] = tmp.us[0]; /* The union trick avoids complicated shifts that are furthermore dependent on endianness. */
tmp2.ui[1] = tmp.us[1]; /* The compiler should take care of optimal assembly decomposition. */
p_dst[2*i] = tmp2; /* Write the two first ints in one write access. */
tmp2.ui[0] = tmp.us[2];
tmp2.ui[1] = tmp.us[3];
p_dst[2*i+1] = tmp2; /* Write the 2 next ints in 1 write access. */
}
EDIT
So I just tested it on SUN M5000 (SPARC64 VII 2.5 GHz) with GCC 3.4.1 in 64-bit mode on a 4,000,000 element array. The naive implementation is a bit faster. I tried with SUNStudio 12 and with GCC 4.3, but I haven't been able to even compile the program because of the array size.
EDIT2
I managed to compile it now on GCC 4.3. The optimized version is a bit faster than the naive one.
GCC 3.4 GCC 4.3
naive 11.1 ms 11.8 ms
optimized 12.4 ms 10.0 ms
EDIT3
We can conclude from that, as far as C is concerned, don't bother with an optimized version of the copy loop, the gain is so low that the risk of error outweighs the benefit.
What about
unsigned short src[N] = ...;
unsigned int dst[N];
for(i=0; i<N; ++i)
dst[i] = src[i];
For a C++ version Konrad's or Nawaz's answers are surely better suited.
int[]
with the same length as the short[]
. short[]
, assigning the i
th element of the short[]
to the i
th position of the int[]
. On many architectures a decrementing do-while
may be faster than the for
and while
loops proposed here. Something like:
unsigned short ushorts[M];
unsigned int uints[N];
int i = M-1;
do{
uints[i] = ushorts[i];
i--;
} while(i >= 0);
The compiler can take care of most optimizations such as loop unrolling, but generally the above is faster (on many architectures) because:
do-while
vs. a while
or for
There may be faster ways as well, such as doing it entirely with pointer arithmetic. This could turn into a fun exercise of disassembling the code and analyzing to see which appears faster. It is all architecture dependent. Fortunately others have done this work for you with std::copy.
Just copy the address of the short array to access each element of the short array, like pTp32[0...LEN-1].arr[0..1]
:
unsigned short shrtArray[LEN]; //..
union type32
{
short arr[2];
int value;
};
type32 * pTp32 = (type32*)shrtArray;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.