简体   繁体   English

在C中分割RGB值的有效方法

[英]Efficient way to split up RGB values in C

I'm writing some software for a 32-bit cortex M0 microcontroller in C and I'm doing alot of manipulations with 32-bit RGB values. 我正在用C为32位cortex M0微控制器编写一些软件,并且正在使用32位RGB值进行很多操作。 They are handled in a 32-bit integer format like 0x00BBRRGG . 它们以32位整数格式(如0x00BBRRGG I want to be able to do math with them without worrying about carry bits spilling between the colors, so I need to split them up into three uint8 values. 我希望能够与它们进行数学运算,而不必担心颜色之间的进位位溢出,因此我需要将它们分成三个uint8值。 Is there an efficient way of doing this? 有有效的方法吗? I'm assuming the inefficient way would be as follows: 我假设效率不高的方式如下:

blue = (RGB >> 16) & 0xFF;
green = (RGB >> 8) & 0xFF;
red = RGB & 0xFF;

//do math

new_RGB = (blue << 16) | (green << 8) | red;

Also, I have a couple of interfaces and one of them uses the format 0x00RRGGBB and the other uses 0x00BBRRGG . 另外,我有几个接口,其中一个使用格式0x00RRGGBB ,另一个使用0x00BBRRGG Is there an efficient way to convert between the two? 是否有一种在两者之间转换的有效方法?

If you use a struct you don't need to do any bit shifting operations. 如果使用struct ,则不需要执行任何移位操作。 I don't know whether this will be efficient with your particular processor, but just making something simple like: 我不知道这对于您的特定处理器是否会有效,但是只需做出一些简单的事情即可:

typedef struct xRGBPixel {
    unsigned char unused;
    unsigned char red;
    unsigned char green;
    unsigned char blue;
} xRGBPixel;

You can have a similar structure for the BRG pixels. BRG像素可以具有类似的结构。 (Are you sure it's BRG and not BGR? That's seriously weird and unconventional.) (您确定它是BRG而不是BGR吗?这是非常古怪和非常规的。)

If that's not as efficient, then Jonathan Leffler's suggestion in the comments about a union of a 32-bit int and an array of 4 unsigned char values may be a better fit. 如果那还不够有效,那么乔纳森·莱夫勒(Jonathan Leffler)在评论中提出的关于32位int和4个unsigned char值数组的并集的建议可能更合适。 Something like this: 像这样:

typedef union Pixel {
    uint32_t pixelAsInt;
    unsigned char pixelAsChar[4];
} Pixel;

To convert 0x00RRGGBB to 0x00BBRRGG you can use the endian converter: 要将0x00RRGGBB转换为0x00BBRRGG,可以使用字节序转换器:

REV    r0,r0     ;0x00RRGGBB -> 0xBBGGRR00
LSRS   r0,r0,#8  ;0xBBGGRR00 -> 0x00BBGGRR

An efficient way to do this could be by writing an assembly function loading the maximum amount of data in free registers, performing the conversion on all registers, and writing them back. 一种有效的方法是编写一个汇编函数,将最大数据量加载到空闲寄存器中,对所有寄存器执行转换,然后将它们写回。
Use the ARM procedure call standard as reference on how to write an assembly function called from C. 使用ARM过程调用标准作为如何编写从C调用的汇编函数的参考。

Another way is by simply performing byte copies, but this requires 3-4* read/writes, where above only requires 2 per pixel. 另一种方法是简单地执行字节复制,但这需要3-4次读/写,而上面的每个像素仅需要2次。

*3 if don't care xxRRGGBB, 4 if 00RRGGBB. * 3如果忽略xxRRGGBB,则为4,如果00RRGGBB。

I want to be able to do math with them without worrying about carry bits spilling between the colors, so I need to split them up into three uint8 values. 我希望能够与它们进行数学运算,而不必担心颜色之间的进位位溢出,因此我需要将它们分成三个uint8值。

No, usually you do not need to (split them into three uint8 values). 不,通常不需要(将它们分成三个uint8值)。 Consider this function: 考虑以下功能:

uint32_t blend(const uint32_t argb0, const uint32_t argb1, const int phase)
{
    if (phase <= 0)
        return argb0;
    else
    if (phase < 256) {
        const uint32_t rb0 = argb0 & 0x00FF00FF;
        const uint32_t rb1 = argb1 & 0x00FF00FF;
        const uint32_t ag0 = (argb0 >> 8) & 0x00FF00FF;
        const uint32_t ag1 = (argb1 >> 8) & 0x00FF00FF;
        const uint32_t rb = rb1 * phase + (256 - phase) * rb0;
        const uint32_t ag = ag1 * phase + (256 - phase) * ag0;
        return ((rb & 0xFF00FF00u) >> 8)
             |  (ag & 0xFF00FF00u);
    } else
        return argb1;
}

This function implements a linear blend from color argb0 ( phase <= 0 ) to argb1 ( phase >= 256 ), by splitting each input vector (with four 8-bit components) into two vectors with two 16-bit components. 通过将每个输入矢量(具有四个8位分量)分成两个具有两个16位分量的矢量,此函数实现了从颜色argb0phase <= 0 )到argb1phase >= 256 )的线性混合。

If you don't need the alpha channel, then it may be more efficient to work on pairs of color values (say, for each pair of pixels) -- so ( 0xRRGGBB , 0xrrggbb ) is split into ( 0x00RR00BB , 0x00rr00bb , 0x00GG00gg ) -- which in the above blend function means one less multiplication (but one more AND and one OR operation). 如果您不需要Alpha通道,则处理成对的颜色值(例如,针对每对像素)可能会更有效-因此( 0xRRGGBB0xrrggbb )分为( 0x00RR00BB0x00rr00bb0x00GG00gg ) -在上面的blend函数中,这意味着较少的乘法运算(但又有一个AND和一个OR运算)。

The 32-bit multiplication operation on Cortex-M0 devices varies between implementations. Cortex-M0器件上的32位乘法运算因实现而异。 Some have a single-cycle multiplication operation, on others it takes 32 cycles. 有些执行单周期乘法运算,而另一些则需要32个周期。 So, depending on the exact Cortex-M0 core used, replacing one multiplication with an AND and an OR may be a big speedup, or a slight slowdown. 因此,根据所使用的确切Cortex-M0内核,用AND和OR取代一个乘法可能会大大提高速度,或略有降低。


When you actually do need the separate components, then leaving the splitting to the compiler often leads to better code generated: instead of specifying the color, pass a pointer to the color value, 当您确实需要单独的组件时,将拆分留给编译器通常会导致生成更好的代码:不用指定颜色,而是将指针传递给颜色值,

uint32_t  some_op(const uint32_t *const argb)
{
    const uint32_t  a = ((const uint8_t *)argb)[0];
    const uint32_t  r = ((const uint8_t *)argb)[1];
    const uint32_t  g = ((const uint8_t *)argb)[2];
    const uint32_t  b = ((const uint8_t *)argb)[3];

    /* Do something ... */

}

This is because many architectures have instructions that load an 8-bit value into a full register, setting all higher bits to zero ( zero extend , uxtb on Cortex-M0 architecture; the C compiler will do this for you). 这是因为许多架构都具有将8位值加载到完整寄存器中的指令,将所有更高的位设置为零(Cortex-M0架构为零扩展uxtb ; C编译器将为您完成此操作)。 Marking both the pointer and the pointed to value, as well as the intermediate values, const , should allow the compiler to optimize the access so that it happens at the best moment/position in the generated code, rather than having to keep it in a register. 标记指针和指向值,以及中间值const都应允许编译器优化访问,以使其在生成的代码中的最佳时刻/位置发生,而不必将其保留在寄存器。 (This is especially true on architectures with few (available) registers, like 32-bit and 64-bit Intel and AMD architectures (x86 and x86-64). Cortex-M0 has 12 general-purpose 32-bit registers, but it depends on the ABI used which ones are "free" to use in a function.) (在具有很少(可用)寄存器的架构上尤其如此,例如32位和64位Intel和AMD架构(x86和x86-64)。Cortex-M0具有12个通用的32位寄存器,但取决于在所用的ABI上),可以在功能中使用哪些“免费”。)


Note that if you are using GCC to compile your code, you can use 请注意,如果您使用GCC编译代码,则可以使用

uint32_t oabc_to_ocba(uint32_t c)
{
    asm volatile ( "rev %0, %0\n\t"
                 : "=r" (c)
                 : "r" (c)
                 );
    return c >> 8;
}

to convert 0x0ABC to 0x0CBA and vice versa. 0x0ABC转换为0x0CBA ,反之亦然。 Normally, it compiles to rev r0, r0 , lsrs r0, r0, #8 , bx lr , but the compiler can inline it and use another register instead (of r0 ). 通常,它编译为rev r0, r0lsrs r0, r0, #8bx lr ,但是编译器可以内联它并使用另一个寄存器代替r0

It is not portable, but since you are on an M0 and probable in little endian mode. 它不是可移植的,但是由于您使用的是M0并且可能处于小字节序模式。 Use bit fields or a union of uint32_t and array of uint8_t. 使用位字段或uint32_t和uint8_t数组的并集。

typedef struct {
    uint32_t red: 8;
    uint32_t green: 8;
    uint32_t blue: 8;
    uint32_t spare: 8;
} rgb_s;

static rgb_s var; // statics init to zero
var.red = 0x56
var.green = 0x34
var.blue = 0x12

uint32_t myInt = *(uint32_t*)&var;  // myInt is now 0x00123456;

use static or make sure the spare field is zeroed out if it is important. 请使用static或确保备用字段为零(如果重要)。

or for unions 或工会

enum {Red, Green, Blue, Colors};

typedef union {
    uint32_t rgb;
    uint8_t color[Colors];
} rgb_u;

rgb_u var;
var.rgb = 0x0;
var.color[red] = 0x56;
var.color[green] = 0x34;
var.color[blue] = 0x12;

assert(var.rgb == 0x123456); //the uint32 overlays the array

Again, neither is really portable but both are common in embedded. 同样,两者都不是真正可移植的,但是两者在嵌入式系统中都是常见的。 You need to know the endian for your processor. 您需要知道处理器的字节序。 (M0 can work big or little but the default is little) There are also anonymous unions is C now but not all embedded compilers support them. (M0可以工作,也可以工作,但默认值是很少)。现在也有匿名联合,是C,但并非所有嵌入式编译器都支持它们。

Your "inefficient" way probably just boils down to a few lines of machine code and shifts are fast - meaning that the shift version will execute incredibly fast and micro optimizations like that shouldn't be a concern in 99% of all applications. 您的“低效率”方式可能仅归结为几行机器代码,并且转换非常快-意味着转换版本将执行非常快的转换,而在99%的所有应用程序中,这样的微优化都不是问题。

Addressing the individual bytes through pointers/arrays is not necessarily a performance improvement. 通过指针/数组寻址各个字节不一定能提高性能。 It might very well be the opposite - check the generated assembly. 可能恰好相反-检查生成的程序集。 If you would use a struct/union solution, it should be for the sake of readability and not for micro-managing performance. 如果要使用结构/联合解决方案,则应出于可读性考虑,而不是出于微观管理性能。

However, the shift version is superior when it comes to portability. 但是,在可移植性方面,shift版本更为出色。 When bit shifting, you don't have to worry about endianess, padding, alignment, pointer aliasing - all of which could be issues with a struct/union solution. 移位时,您不必担心字节序,填充,对齐,指针别名-所有这些可能都是结构/联合解决方案的问题。

The root of the problem is actually the 32 bit integer representation. 问题的根源实际上是32位整数表示形式。 If you can get rid of that, it will solve a lot of problems. 如果您能摆脱它,它将解决很多问题。 The ideal format here would be uint8_t color[3]; 理想的格式是uint8_t color[3]; .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM