I'm trying to store a wchar_t string as octets, but I'm positive I'm doing it wrong - anybody mind to validate my attempt? What's going to happen when one char will consume 4 bytes?
unsigned int i;
const wchar_t *wchar1 = L"abc";
wprintf(L"%ls\r\n", wchar1);
for (i=0;i< wcslen(wchar1);i++) {
printf("(%d)", (wchar1[i]) & 255);
printf("(%d)", (wchar1[i] >> 8) & 255);
}
Unicode text is always encoded. Popular encodings are UTF-8, UTF-16 and UTF-32. Only the latter has a fixed size for a glyph. UTF-16 uses surrogates for codepoints in the upper planes, such a glyph uses 2 wchar_t. UTF-8 is byte oriented, it uses between 1 and 4 bytes to encode a codepoint.
UTF-8 is an excellent choice if you need to transcode the text to a byte oriented stream. A very common choice for text files and HTML encoding on the Internet. If you use Windows then you can use WideCharToMultiByte() with CodePage = CP_UTF8. A good alternative is the ICU library.
Be careful to avoid byte encodings that translate text to a code page, such as wcstombs(). They are lossy encodings, glyphs that don't have a corresponding character code in the code page are replaced by ?.
You can use the wcstombs()
(widechar string to multibyte string) function provided in stdlib.h
The prototype is as follows:
#include <stdlib.h>
size_t wcstombs(char *dest, const wchar_t *src, size_t n);
It will correctly convert your wchar_t
string provided by src
into a char
(aka octets) string and write it to dest
with at most n
bytes.
char wide_string[] = "Hellöw, Wörld! :)";
char mb_string[512]; /* Might want to calculate a better, more realistic size! */
int i, length;
memset(mb_string, 0, 512);
length = wcstombs(mb_string, wide_string, 511);
/* mb_string will be zero terminated if it wasn't cancelled by reaching the limit
* before being finished with converting. If the limit WAS reached, the string
* will not be zero terminated and you must do it yourself - not happening here */
for (i = 0; i < length; i++)
printf("Octet #%d: '%02x'\n", i, mb_string[i]);
If you're trying to see the content of the memory buffer holding the string, you can do this:
size_t len = wcslen(str) * sizeof(wchar_t);
const char *ptr = (const char*)(str);
for (i=0; i<len; i++) {
printf("(%u)", ptr[i]);
}
I don't know why printf and wprintf do not work together. Following code works.
unsigned int i;
const wchar_t *wchar1 = L"abc";
wprintf(L"%ls\r\n", wchar1);
for(i=0; i<wcslen(wchar1); i++)
{
wprintf(L"(%d)", (wchar1[i]) & 255);
wprintf(L"(%d)", (wchar1[i] >> 8) & 255);
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.