I'm working in C++ with an array of unsigned char representing pixels in an image. Each pixel has 3 channel (R,G,B). The image is represented linearly, sort of like
RGBRGBRGBRGB.....
How do I split each of the R,G and B, into separate arrays efficiently?
I tried:
for(int pos = 0; pos < srcWidth * srcHeight; pos++) {
int rgbPos = pos * 3;
splitChannels[0][pos] = rgbSrcData[rgbPos];
splitChannels[1][pos] = rgbSrcData[rgbPos + 1];
splitChannels[2][pos] = rgbSrcData[rgbPos + 2];
}
But this is surprisingly slow.
Thanks!
My attempt: load and store the bytes four by four. Byte scrambling will be tedious but possibly throughput will improve.
// Load 4 interleaved pixels
unsigned int RGB0= ((int*)rgbSrcData)[i];
unsigned int RGB1= ((int*)rgbSrcData)[i + 1];
unsigned int RGB2= ((int*)rgbSrcData)[i + 2];
// Rearrange and store 4 unpacked pixels
((int*)splitChannels[0])[j]=
(RGB0 & 0xFF) | (RGB0 >> 24) | (RGB1 & 0xFF0000) | ((RGB2 & 0xFF00) << 16);
((int*)splitChannels[1])[j]=
((RGB0 & 0xFF00) >> 8) | (RGB1 & 0xFF) | (RGB1 >> 24) | (RGB2 & 0xFF0000) >> 16;
((int*)splitChannels[2])[j]=
((RGB0 & 0xFF0000) >> 16) | (RGB1 & 0xFF00) | ((RGB2 & 0xFF) >> 16) | (RGB2 & 0xFF000000);
(CAUTION: not unchecked !) A shift-only version is also possible.
An SSE solution would be more complex (the stride 3 does not get along with powers of 2).
A great technique to use to make it run faster is loop unwinding. You can read about it here: http://en.wikipedia.org/wiki/Loop_unwinding
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.