简体   繁体   English

如何检查 8 位无符号字符中的设置位数?

[英]How to check the number of set bits in an 8-bit unsigned char?

So I have to find the set bits (on 1) of an unsigned char variable in C?所以我必须在 C 中找到 unsigned char 变量的设置位(在 1 上)?

A similar question is How to count the number of set bits in a 32-bit integer?一个类似的问题是如何计算 32 位整数中设置的位数? But it uses an algorithm that's not easily adaptable to 8-bit unsigned chars (or its not apparent).但它使用的算法不容易适应 8 位无符号字符(或不明显)。

The algorithm suggested in the question How to count the number of set bits in a 32-bit integer? 问题中建议的算法如何计算32位整数中的设置位数? is trivially adapted to 8 bit: 简单地适应了8位:

int NumberOfSetBits( uint8_t b )
{
     b = b - ((b >> 1) & 0x55);
     b = (b & 0x33) + ((b >> 2) & 0x33);
     return (((b + (b >> 4)) & 0x0F) * 0x01);
}

It is simply a case of shortening the constants the the least significant eight bits, and removing the final 24 bit right-shift. 这只是将常数的最低有效八位缩短,然后删除最后的24位右移的情况。 Equally it could be adapted for 16bit using an 8 bit shift. 同样可以使用8位移位将其调整为16位。 Note that in the case for 8 bit, the mechanical adaptation of the 32 bit algorithm results in a redundant * 0x01 which could be omitted. 请注意,如果是8位,则32位算法的机械适配会导致冗余* 0x01 ,可以将其省略。

The fastest approach for an 8-bit variable is using a lookup table. 对于8位变量,最快的方法是使用查找表。

Build an array of 256 values, one per 8-bit combination. 构建一个256个值的数组,每8位组合一个。 Each value should contain the count of bits in its corresponding index: 每个值应在其相应索引中包含位数:

int bit_count[] = {
// 00 01 02 03 04 05 06 07 08 09 0a, ... FE FF
    0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, ..., 7, 8
};

Getting a count of a combination is the same as looking up a value from the bit_count array. 获取组合计数与从bit_count数组中查找值相同。 The advantage of this approach is that it is very fast. 这种方法的优点是非常快。

You can generate the array using a simple program that counts bits one by one in a slow way: 您可以使用一个简单的程序生成数组,该程序以慢速方式逐位计数:

for (int i = 0 ; i != 256 ; i++) {
    int count = 0;
    for (int p = 0 ; p != 8 ; p++) {
        if (i & (1 << p)) {
            count++;
        }
    }
    printf("%d, ", count);
}

( demo that generates the table ). 生成表的演示 )。

If you would like to trade some CPU cycles for memory, you can use a 16-byte lookup table for two 4-bit lookups: 如果您希望将某些CPU周期换为内存,则可以将一个16字节的查找表用于两个4位查找:

static const char split_lookup[] = {
    0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4
};

int bit_count(unsigned char n) {
    return split_lookup[n&0xF] + split_lookup[n>>4];
}

Demo . 演示

Counting the number of digits different than 0 is also known as a Hamming Weight . 计算不为0的数字位数也称为汉明权重 In this case, you are counting the number of 1's. 在这种情况下,您要计算1的数量。

Dasblinkenlight provided you with a table driven implementation, and Olaf provided you with a software based solution. Dasblinkenlight为您提供了表驱动的实现,而Olaf为您提供了基于软件的解决方案。 I think you have two other potential solutions. 我认为您还有另外两个潜在的解决方案。 The first is to use a compiler extension, the second is to use an ASM specific instruction with inline assembly from C. 第一种是使用编译器扩展,第二种是使用ASM特定指令以及C语言中的内联汇编。

For the first alternative, see GCC's __builtin_popcount() . 对于第一种选择,请参阅GCC的__builtin_popcount() (Thanks to Artless Noise). (感谢无误的噪音)。

For the second alternative, you did not specify the embedded processor, but I'm going to offer this in case its ARM based. 对于第二种选择,您没有指定嵌入式处理器,但是在基于ARM的情况下,我将提供它。

Some ARM processors have the VCNT instruction, which performs the count for you. 某些ARM处理器具有VCNT指令,该指令将为您执行计数。 So you could do it from C with inline assembly: 因此,您可以使用内联汇编从C中完成此操作:

inline
unsigned int hamming_weight(unsigned char value) {
    __asm__ __volatile__ (
            "VCNT.8"
            : "=value"
            : "value"
    );

    return value;
}

Also see Fastest way to count number of 1s in a register, ARM assembly . 另请参阅计算寄存器ARM汇编中1最快的方法


For completeness, here is Kernighan's bit counting algorithm: 为了完整起见,这是Kernighan的位计数算法:

int count_bits(int n) {
    int count = 0;
    while(n != 0) {
        n &= (n-1);
        count++;
    }
    return count;
}

Also see Please explain the logic behind Kernighan's bit counting algorithm . 另请参阅请解释Kernighan的位计数算法背后的逻辑

I think you are looking for Hamming Weight algorithm for 8bits? 我认为您正在寻找8位汉明加权算法? If it is true, here is the code: 如果是这样,则代码如下:

unsigned char in = 22; //This is your input number
unsigned char out = 0;
in = in - ((in>>1) & 0x55);
in = (in & 0x33) + ((in>>2) & 0x33);
out = ((in + (in>>4) & 0x0F) * 0x01) ;

I made an optimized version.我做了一个优化的版本。 With a 32-bit processor, utilizing multiplication, bit shifting and masking can make smaller code for the same task, especially when the input domain is small (8-bit unsigned integer).对于 32 位处理器,利用乘法、位移和掩码可以为相同的任务生成更小的代码,尤其是当输入域很小(8 位无符号整数)时。

The following two code snippets are equivalent:以下两个代码片段是等效的:

unsigned int bit_count_uint8(uint8_t x)
{
    uint32_t n;
    n = (uint32_t)(x * 0x08040201UL);
    n = (uint32_t)(((n >> 3) & 0x11111111UL) * 0x11111111UL);
    return (n >> 28) & 0x0F;
}

/*
unsigned int bit_count_uint8_naive(uint8_t x)
{
    x = x - ((x >> 1) & 0x55);
    x = (x & 0x33) + ((x >> 2) & 0x33);
    x = ((x + (x >> 4)) & 0x0F);
    return x;
}
*/
  • This produces smallest binary code for IA-32, x86-64 and AArch32 (without NEON instruction set), as far as I can find.据我所知,这会为 IA-32、x86-64 和 AArch32(没有 NEON 指令集)生成最小的二进制代码。
  • For x86-64, this doesn't use the fewest number of instructions, but the bit shifts and downcasting avoid the use of 64-bit instructions and therefore save a few bytes in the compiled binary.对于 x86-64,这并没有使用最少数量的指令,但是位移和向下转换避免了使用 64 位指令,因此在编译后的二进制文件中节省了一些字节。

Explanation解释

I denote the eight bits of the byte x , from MSB to LSB, as a , b , c , d , e , f , g and h .我将字节x的八位,从 MSB 到 LSB 表示为abcdefgh

                               abcdefgh
*   00001000 00000100 00000010 00000001 (make 4 copies of x
---------------------------------------  with appropriate
abc defgh0ab cdefgh0a bcdefgh0 abcdefgh  bit spacing)
>> 3                                   
---------------------------------------
    000defgh 0abcdefg h0abcdef gh0abcde
&   00010001 00010001 00010001 00010001
---------------------------------------
    000d000h 000c000g 000b000f 000a000e
*   00010001 00010001 00010001 00010001
---------------------------------------
    000d000h 000c000g 000b000f 000a000e
... 000h000c 000g000b 000f000a 000e
... 000c000g 000b000f 000a000e
... 000g000b 000f000a 000e
... 000b000f 000a000e
... 000f000a 000e
... 000a000e
... 000e
    ^^^^ (Bits 31-28 will contain the sum of the bits
          a, b, c, d, e, f, g and h. Extract these
          bits and we are done.)

Maybe not the fastest, but straightforward: 也许不是最快,但很简单:

int count = 0;

for (int i = 0; i < 8; ++i) {
    unsigned char c = 1 << i;
    if (yourVar & c) {
        //bit n°i is set
        //first bit is bit n°0
        count++;
    }
}

For 8/16 bit MCUs, a loop will very likely be faster than the parallel-addition approach, as these MCUs cannot shift by more than one bit per instruction, so: 对于8/16位MCU,循环很可能比并行加法更快,因为这些MCU每条指令的移位不能超过一位,因此:

size_t popcount(uint8_t val)
{
    size_t cnt = 0;
    do {
        cnt += val & 1U;    // or: if ( val & 1 ) cnt++;
    } while ( val >>= 1 ) ;
    return cnt;
}

For the incrementation of cnt, you might profile. 为了增加cnt,您可以进行概要分析。 If still too slow, an assember implementation might be worth a try using carry flag (if available). 如果仍然太慢,可以使用进位标志(如果有)尝试使用assmber实现。 While I am in against using assembler optimizations in general, such algorithms are one of the few good exceptions (still just after the C version fails). 虽然我通常反对使用汇编程序优化,但是这样的算法是为数不多的好例外之一(仍然在C版本失败之后)。

If you can omit the Flash, a lookup table as proposed by @dasblinkenlight is likey the fastest approach. 如果可以省略Flash,则@dasblinkenlight建议的查找表可能是最快的方法。

Just a hint: For some architectures (notably ARM and x86/64), gcc has a builtin: __builtin_popcount() , you also might want to try if available (although it takes int at least). 只是一个提示:对于某些体系结构(尤其是ARM和x86 / 64),gcc具有内置函数:__ __builtin_popcount() ,您可能还想尝试一下是否可用(尽管它至少需要int)。 This might use a single CPU instruction - you cannot get faster and more compact. 这可能只使用一条CPU指令-您无法获得更快,更紧凑的结果。

Allow me to post a second answer.请允许我发布第二个答案。 This one is the smallest possible for ARM processors with Advanced SIMD extension (NEON).这对于具有高级 SIMD 扩展 (NEON) 的 ARM 处理器来说是最小的。 It's even smaller than __builtin_popcount() (since __builtin_popcount() is optimized for unsigned int input, not uint8_t ).它甚至比__builtin_popcount()更小(因为__builtin_popcount()针对unsigned int输入进行了优化,而不是uint8_t )。

#ifdef __ARM_NEON
/* ARM C Language Extensions (ACLE) recommends us to check __ARM_NEON before
   including <arm_neon.h> */
#include <arm_neon.h>

unsigned int bit_count_uint8(uint8_t x)
{
    /* Set all lanes at once so that the compiler won't emit instruction to
       zero-initialize other lanes. */
    uint8x8_t v = vdup_n_u8(x);
    /* Count the number of set bits for each lane (8-bit) in the vector. */
    v = vcnt_u8(v);
    /* Get lane 0 and discard other lanes. */
    return vget_lane_u8(v, 0);
}
#endif

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM