wchar_t到八位字节-在C中？

Question

I'm trying to store a wchar_t string as octets, but I'm positive I'm doing it wrong - anybody mind to validate my attempt? 我正在尝试将wchar_t字符串存储为八位字节，但是我很肯定自己做错了-有人介意验证我的尝试吗？ What's going to happen when one char will consume 4 bytes? 一个字符占用4个字节会发生什么情况？

  unsigned int i;
  const wchar_t *wchar1 = L"abc";
  wprintf(L"%ls\r\n", wchar1);

  for (i=0;i< wcslen(wchar1);i++) {
    printf("(%d)", (wchar1[i]) & 255);
    printf("(%d)", (wchar1[i] >> 8) & 255);
  }

Answer 1

Unicode text is always encoded. Unicode文本始终是编码的。 Popular encodings are UTF-8, UTF-16 and UTF-32. 流行的编码是UTF-8，UTF-16和UTF-32。 Only the latter has a fixed size for a glyph. 只有后者具有固定的字形大小。 UTF-16 uses surrogates for codepoints in the upper planes, such a glyph uses 2 wchar_t. UTF-16在上层平面中使用代用品作为代码点，这样的字形使用2 wchar_t。 UTF-8 is byte oriented, it uses between 1 and 4 bytes to encode a codepoint. UTF-8是面向字节的，它使用1到4个字节对一个编码点进行编码。

UTF-8 is an excellent choice if you need to transcode the text to a byte oriented stream. 如果您需要将文本转码为面向字节的流，则UTF-8是一个绝佳的选择。 A very common choice for text files and HTML encoding on the Internet. Internet上文本文件和HTML编码的非常常见的选择。 If you use Windows then you can use WideCharToMultiByte() with CodePage = CP_UTF8. 如果使用Windows，则可以将WideCharToMultiByte（）与CodePage = CP_UTF8一起使用。 A good alternative is the ICU library. ICU库是一个很好的选择。

Be careful to avoid byte encodings that translate text to a code page, such as wcstombs(). 请注意避免将文本转换为代码页的字节编码，例如wcstombs（）。 They are lossy encodings, glyphs that don't have a corresponding character code in the code page are replaced by ?. 它们是有损编码，在代码页中没有相应字符代码的字形由？代替。

Answer 2

You can use the wcstombs() (widechar string to multibyte string) function provided in stdlib.h 您可以使用stdlib.h提供的wcstombs() （宽字符字符串到多字节字符串）功能

The prototype is as follows: 原型如下：

#include <stdlib.h>

size_t wcstombs(char *dest, const wchar_t *src, size_t n);

It will correctly convert your wchar_t string provided by src into a char (aka octets) string and write it to dest with at most n bytes. 它将src提供的wchar_t字符串正确转换为char （又名八位字节）字符串，并将其最多写入n个字节的dest 。

char wide_string[] = "Hellöw, Wörld! :)";
char mb_string[512]; /* Might want to calculate a better, more realistic size! */
int i, length;

memset(mb_string, 0, 512);
length = wcstombs(mb_string, wide_string, 511);

/* mb_string will be zero terminated if it wasn't cancelled by reaching the limit
 * before being finished with converting. If the limit WAS reached, the string
 * will not be zero terminated and you must do it yourself - not happening here */

for (i = 0; i < length; i++)
   printf("Octet #%d: '%02x'\n", i, mb_string[i]);

Answer 3

If you're trying to see the content of the memory buffer holding the string, you can do this: 如果试图查看包含字符串的内存缓冲区的内容，则可以执行以下操作：

  size_t len = wcslen(str) * sizeof(wchar_t);
  const char *ptr = (const char*)(str);
  for (i=0; i<len; i++) {
    printf("(%u)", ptr[i]);
  }

Answer 4

I don't know why printf and wprintf do not work together. 我不知道为什么printf和wprintf不能一起工作。 Following code works. 以下代码有效。

unsigned int i;
const wchar_t *wchar1 = L"abc";
wprintf(L"%ls\r\n", wchar1);

for(i=0; i<wcslen(wchar1); i++)
{   
    wprintf(L"(%d)", (wchar1[i]) & 255);
    wprintf(L"(%d)", (wchar1[i] >> 8) & 255);
}

wchar_t到八位字节-在C中？

问题描述

4 个解决方案

解决方案1
4 已采纳 2010-07-25 15:30:31

解决方案2
1 2010-07-25 14:08:19

解决方案3
0 2010-07-25 14:15:20

解决方案4
0 2010-07-25 16:50:27

wchar_t到八位字节-在C中？

问题描述

4 个解决方案

解决方案1 4 已采纳 2010-07-25 15:30:31

解决方案2 1 2010-07-25 14:08:19

解决方案3 0 2010-07-25 14:15:20

解决方案4 0 2010-07-25 16:50:27

解决方案1
4 已采纳 2010-07-25 15:30:31

解决方案2
1 2010-07-25 14:08:19

解决方案3
0 2010-07-25 14:15:20

解决方案4
0 2010-07-25 16:50:27