简体   繁体   English

在 C 中将 RGB 转换为 RGBA

[英]Convert RGB to RGBA in C

I need to copy the contents of a byte array representing an image in RGB byte order into another RGBA(4 bytes per pixel) buffer.我需要以 RGB 字节顺序将表示图像的字节数组的内容复制到另一个 RGBA(每像素 4 字节)缓冲区中。 The alpha channel will get filled later.稍后将填充 Alpha 通道。 What would be the fastest way of achieving this?实现这一目标的最快方法是什么?

How tricky do you want it?你想要它有多棘手? You could set it up to copy a 4-byte word at a time, which might be a bit faster on some 32-bit systems:您可以将其设置为一次复制一个 4 字节的字,这在某些 32 位系统上可能会更快一些:

void fast_unpack(char* rgba, const char* rgb, const int count) {
    if(count==0)
        return;
    for(int i=count; --i; rgba+=4, rgb+=3) {
        *(uint32_t*)(void*)rgba = *(const uint32_t*)(const void*)rgb;
    }
    for(int j=0; j<3; ++j) {
        rgba[j] = rgb[j];
    }
}

The extra case on the end is to deal with the fact that the rgb array is missing a byte.最后的额外情况是处理 rgb 数组缺少一个字节的事实。 You could also make it a bit faster using aligned moves and SSE instructions, working in multiples of 4 pixels at a time.您还可以使用对齐的移动和 SSE 指令使其更快一点,一次以 4 个像素的倍数工作。 If you're feeling really ambitious, you can try even more horribly obfuscated things like prefetching a cache line into the FP registers, for example, then blitting it across to the other image all at once.如果你真的很有野心,你可以尝试更可怕的混淆操作,例如将缓存行预取到 FP 寄存器中,然后一次性将其传送到另一个图像。 Of course the mileage you get out of these optimizations is going to be highly dependent on the specific system configuration you are targetting, and I would be really skeptical that there is much benefit at all to doing any of this instead of the simple thing.当然,您从这些优化中获得的里程将在很大程度上取决于您所针对的特定系统配置,我真的怀疑做任何这些而不是简单的事情是否有很多好处。

And my simple experiments confirm that this is indeed a little bit faster, at least on my x86 machine.而我的简单实验证实,这确实快了一点,至少在我的 x86 机器上是这样。 Here is a benchmark:这是一个基准:

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <time.h>

void fast_unpack(char* rgba, const char* rgb, const int count) {
    if(count==0)
        return;
    for(int i=count; --i; rgba+=4, rgb+=3) {
        *(uint32_t*)(void*)rgba = *(const uint32_t*)(const void*)rgb;
    }
    for(int j=0; j<3; ++j) {
        rgba[j] = rgb[j];
    }
}

void simple_unpack(char* rgba, const char* rgb, const int count) {
    for(int i=0; i<count; ++i) {
        for(int j=0; j<3; ++j) {
            rgba[j] = rgb[j];
        }
        rgba += 4;
        rgb  += 3;
    }
}

int main() {
    const int count = 512*512;
    const int N = 10000;

    char* src = (char*)malloc(count * 3);
    char* dst = (char*)malloc(count * 4);

    clock_t c0, c1;    
    double t;
    printf("Image size = %d bytes\n", count);
    printf("Number of iterations = %d\n", N);

    printf("Testing simple unpack....");
    c0 = clock();
    for(int i=0; i<N; ++i) {
        simple_unpack(dst, src, count);
    }
    c1 = clock();
    printf("Done\n");
    t = (double)(c1 - c0) / (double)CLOCKS_PER_SEC;
    printf("Elapsed time: %lf\nAverage time: %lf\n", t, t/N);


    printf("Testing tricky unpack....");
    c0 = clock();
    for(int i=0; i<N; ++i) {
        fast_unpack(dst, src, count);
    }
    c1 = clock();
    printf("Done\n");
    t = (double)(c1 - c0) / (double)CLOCKS_PER_SEC;
    printf("Elapsed time: %lf\nAverage time: %lf\n", t, t/N);

    return 0;
}

And here are the results (compiled with g++ -O3):以下是结果(使用 g++ -O3 编译):

Image size = 262144 bytes图像大小 = 262144 字节

Number of iterations = 10000迭代次数 = 10000

Testing simple unpack....Done测试简单的解包....完成

Elapsed time: 3.830000经过时间:3.830000

Average time: 0.000383平均时间:0.000383

Testing tricky unpack....Done测试棘手的解包....完成

Elapsed time: 2.390000经过时间:2.390000

Average time: 0.000239平均时间:0.000239

So, maybe about 40% faster on a good day.因此,在美好的一天可能会快 40%。

The fastest was would be to use a library that implements the conversion for you rather than writing it yourself.最快的方法是使用为您实现转换的库,而不是自己编写。 Which platform[s] are you targeting?您的目标是哪个平台?

If you insist on writing it yourself for some reason, write a simple and correct version first.如果你因为某种原因坚持自己写,先写一个简单正确的版本。 Use that.用那个。 If the performance is inadequate, then you can think about optimizing it.如果性能不够,那么可以考虑优化一下。 In general, this sort of conversion is best done using vector permutes, but the exact optimal sequence varies depending on the target architecture.一般来说,这种转换最好使用向量置换来完成,但确切的最佳序列会因目标架构而异。

struct rgb {
   char r;
   char g;
   char b;
};

struct rgba {
   char r;
   char g;
   char b;
   char a;
}

void convert(struct rgba * dst, const struct rgb * src, size_t num)
{
    size_t i;
    for (i=0; i<num; i++) {
        dst[i].r = src[i].r;
        dst[i].g = src[i].g;
        dst[i].b = src[i].b;
    }
}

This would be the cleaner solution, but as you mention an array of bytes, you should use this:这将是更清洁的解决方案,但是当您提到字节数组时,您应该使用它:

// num is still the size in pixels. So dst should have space for 4*num bytes,
// while src is supposed to be of length 3*num.
void convert(char * dst, const char * src, size_t num)
{
    size_t i;
    for (i=0; i<num; i++) {
        dst[4*i] = src[3*i];
        dst[4*i+1] = src[3*i+1];
        dst[4*i+2] = src[3*i+2];
    }
}

I think i remmember a Nehe tutorial of doing something like that, but fast.我想我记得一个 Nehe 教程做类似的事情,但很快。

Its here在这里

The interesting part is here:有趣的部分在这里:

void flipIt(void* buffer)                       // Flips The Red And Blue Bytes (256x256)
{
    void* b = buffer;                       // Pointer To The Buffer
    __asm                               // Assembler Code To Follow
    {
        mov ecx, 256*256                    // Set Up A Counter (Dimensions Of Memory Block)
        mov ebx, b                      // Points ebx To Our Data (b)
        label:                          // Label Used For Looping
            mov al,[ebx+0]                  // Loads Value At ebx Into al
            mov ah,[ebx+2]                  // Loads Value At ebx+2 Into ah
            mov [ebx+2],al                  // Stores Value In al At ebx+2
            mov [ebx+0],ah                  // Stores Value In ah At ebx

            add ebx,3                   // Moves Through The Data By 3 Bytes
            dec ecx                     // Decreases Our Loop Counter
            jnz label                   // If Not Zero Jump Back To Label
    }
}

what it does is pretty self explanatory, and it should be easy to transform this into adding the alpha byte.它的作用是不言自明的,应该很容易将其转换为添加 alpha 字节。

Just create array with size of 4/3 of source array.只需创建大小为源数组 4/3 的数组。 Read entire array and write it to RGBA array, but after every 3bytes insert 255 for alpha.读取整个数组并将其写入 RGBA 数组,但在每 3 个字节后插入 255 个 alpha。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM