简体   繁体   中英

C++ bit manipulation

I am trying to extract character value from UTF-8 format. Suppose I have two characters, and I extract 5 bits from first character => 10111 and 6 bits from another character => 010000

so

ch1 = 10111;
ch2 = 010000;

how would I combine them to form 10111010000 and output its hex as 0x5d0? Do I need to shift or is there an easier way to do this, because checking the documentation write appear to be able to read characters sequentially, is there a similar function like this? Also, it appears I would need a char buffer since 10111010000 is 11 bits long. Does any know how to go about this?

You need to use shifting, plus the | or |= operator.

unsigned int ch3 = (ch1 << 6) | ch2;
// ch3 = 0000010111010000

I'm assuming here that an unsigned int is 16 bits. Your mileage may vary.

You will definitely need to use shift and OR.

First, declare an unsigned integer type of the right size. I like the C99 types defined in stdint.h but your C++ compiler may not have them. If you don't have uint16_t then you can use unsigned short . That is 16 bits wide and can hold 11 bits.

Then you would figure out which bits go into the high bits. It looks like it should be:

unsigned short ch1 = 0x17;
unsigned short ch2 = 0x10;
unsigned short result = (ch1 << 6) | ch2;

1: for combining them together:

char bytes[2] = { 0x17, 0x10 }; // for example

unsigned short result = 0;      // 00000000  00000000
result = bytes[0] << 6;         // 101 11000000
result |= bytes[1];             // 101 11010000

2: for printing it out as hex

std::cout << std::showbase << std::hex << <what you want to print>;

in this case:

std::cout << std::showbase << std::hex << result
// output: 0x5d0 if it is little-endian, it depends on your operating system

First, from K&R: "Almost everything about bitfields is implementation dependent".

The following works on MS Visual Studio 2008:

#include <stdio.h>
#include <string.h>

struct bitbag {
    unsigned int ch2 : 6;
    unsigned int ch1 : 6;
};

int main ()
{
    struct bitbag bits;

    memset(&bits, 0, sizeof(bits));

    bits.ch1 = 0x17;    // 010111
    bits.ch2 = 0x10;    // 010000

    printf ("0x%06x 0x%06x\n", bits.ch1, bits.ch2);
    printf ("0x%0x\n", bits);

    return 0;
}

Produces the output:

0x000017 0x000010
0x5d0

However I could not guarentee that it will work in the same way in all compilers. Note the memset which initialises any padding to zero.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM