I am trying to extract character value from UTF-8 format. Suppose I have two characters, and I extract 5 bits from first character => 10111 and 6 bits from another character => 010000
so
ch1 = 10111;
ch2 = 010000;
how would I combine them to form 10111010000 and output its hex as 0x5d0? Do I need to shift or is there an easier way to do this, because checking the documentation write
appear to be able to read characters sequentially, is there a similar function like this? Also, it appears I would need a char buffer since 10111010000 is 11 bits long. Does any know how to go about this?
You need to use shifting, plus the |
or |=
operator.
unsigned int ch3 = (ch1 << 6) | ch2;
// ch3 = 0000010111010000
I'm assuming here that an unsigned int
is 16 bits. Your mileage may vary.
You will definitely need to use shift and OR.
First, declare an unsigned integer type of the right size. I like the C99 types defined in stdint.h but your C++ compiler may not have them. If you don't have uint16_t
then you can use unsigned short
. That is 16 bits wide and can hold 11 bits.
Then you would figure out which bits go into the high bits. It looks like it should be:
unsigned short ch1 = 0x17;
unsigned short ch2 = 0x10;
unsigned short result = (ch1 << 6) | ch2;
char bytes[2] = { 0x17, 0x10 }; // for example
unsigned short result = 0; // 00000000 00000000
result = bytes[0] << 6; // 101 11000000
result |= bytes[1]; // 101 11010000
std::cout << std::showbase << std::hex << <what you want to print>;
in this case:
std::cout << std::showbase << std::hex << result
// output: 0x5d0 if it is little-endian, it depends on your operating system
First, from K&R: "Almost everything about bitfields is implementation dependent".
The following works on MS Visual Studio 2008:
#include <stdio.h>
#include <string.h>
struct bitbag {
unsigned int ch2 : 6;
unsigned int ch1 : 6;
};
int main ()
{
struct bitbag bits;
memset(&bits, 0, sizeof(bits));
bits.ch1 = 0x17; // 010111
bits.ch2 = 0x10; // 010000
printf ("0x%06x 0x%06x\n", bits.ch1, bits.ch2);
printf ("0x%0x\n", bits);
return 0;
}
Produces the output:
0x000017 0x000010
0x5d0
However I could not guarentee that it will work in the same way in all compilers. Note the memset
which initialises any padding to zero.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.