简体   繁体   English

从UTF-8到UTF-16大字节序的字符串转换失败(使用C,C ++语言)

[英]String conversion from UTF-8 to UTF-16 Big endian is failing (using C, C++ language)

I am using g_convert() glib function to convert utf-8 string to utf-16 big endian string. 我正在使用g_convert()glib函数将utf-8字符串转换为utf-16大端字符串。 The conversion is failing. 转换失败。 We are getting an error saying "conversion is not supported" 我们收到一条错误消息,提示“不支持转换”

Could someone give a clue to overcome this issue. 有人可以提供一些线索来解决这个问题。

Thanks 谢谢

Following is the piece of code used to convert string from UTF-8. 以下是用于从UTF-8转换字符串的代码段。 to UTF16 Bigendian 到UTF16 Bigendian

unsigned short *result_str;

gsize bytes_read, bytes_written;

gssize len = 0;

GError *error = NULL;

result_str = (unsigned short *)g_convert("text data", len, "UTF-16BE", "UTF-8", &bytes_read, &bytes_written, &error);

len为0。GLib 手册说,对于以NULL结尾的字符串, len必须为-1。

g_convert uses iconv underneath the covers. g_convert在封面下使用iconv。

On my machine using cygwim I can do 在我使用cygwim的机器上,我可以

iconv -l 

which lists the supported encodings and UTF-16BE does appear in the list however:- 其中列出了受支持的编码,并且UTF-16BE确实出现在列表中:

$ iconv -l | grep BE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-4BE
UTF-16BE
UTF-32BE

James@XPL3KWK28 ~
$ iconv -f UTF-8 -t UTF16-BE
iconv: conversion to UTF16-BE unsupported
iconv: try 'iconv -l' to get the list of supported encodings

as you can see it does not support the conversion to or from UTF-8. 如您所见,它不支持与UTF-8之间的转换。

You probably need to do this in two stages UTF-8 to UTF-16 then UTF-16 to UTF-16BE. 您可能需要分两个阶段执行此操作,即从UTF-8到UTF-16,然后从UTF-16到UTF-16BE。

I suspect UTF-16BE is not supported by g_convert (based on the error message). 我怀疑g_convert不支持UTF-16BE (基于错误消息)。 It's trivial to convert UTF-8 into UTF-16BE though (no tables or other garbage like that) -- you can do that transformation yourself. 但是,将UTF-8转换为UTF-16BE并不容易(没有任何表或类似的垃圾),您可以自己进行转换。

You might also want to check if UTF-16 is supported and do your own byte swapping if necessary. 您可能还需要检查是否支持UTF-16 ,并在需要时进行自己的字节交换。 But I do not believe g_convert supports UTF-16 either. 但是我也不相信g_convert支持UTF-16

Looks like your system does not support that conversion. 看来您的系统不支持该转换。 (This error means that iconv() returned EINVAL.) (此错误意味着iconv()返回EINVAL。)

On my Linux system it does appear to be supported: 在我的Linux系统上,它确实受到支持:

echo "Hello" | iconv --from-code UTF-16BE --to-code UTF-8

(obviously "Hello" is not a valid UTF-16 string, but it does get converted to something, so the actual conversion seems to be supported) (显然,“ Hello”不是有效的UTF-16字符串,但确实会转换为某种形式,因此似乎支持实际的转换)

See if you have UTF-16BE in "iconv --list" 查看“ iconv --list”中是否有UTF-16BE

In this particular case your simplest solution might be to just use g_utf8_to_utf16(): http://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html#g-utf8-to-utf16 在这种情况下,最简单的解决方案可能是只使用g_utf8_to_utf16(): http ://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html#g-utf8-to-utf16

You can easily do your own byteswap, untested code: 您可以轻松地编写自己的byteswap未经测试的代码:

if (G_BYTE_ORDER != G_BIG_ENDIAN) {
  for (i = 0; i < len; ++i) {
    result_str[i] = GUINT16_TO_BE(result_str[i]);
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM