[英]String conversion from UTF-8 to UTF-16 Big endian is failing (using C, C++ language)
I am using g_convert() glib function to convert utf-8 string to utf-16 big endian string. 我正在使用g_convert()glib函数将utf-8字符串转换为utf-16大端字符串。 The conversion is failing.
转换失败。 We are getting an error saying "conversion is not supported"
我们收到一条错误消息,提示“不支持转换”
Could someone give a clue to overcome this issue. 有人可以提供一些线索来解决这个问题。
Thanks 谢谢
Following is the piece of code used to convert string from UTF-8. 以下是用于从UTF-8转换字符串的代码段。 to UTF16 Bigendian
到UTF16 Bigendian
unsigned short *result_str;
gsize bytes_read, bytes_written;
gssize len = 0;
GError *error = NULL;
result_str = (unsigned short *)g_convert("text data", len, "UTF-16BE", "UTF-8", &bytes_read, &bytes_written, &error);
您len
为0。GLib 手册说,对于以NULL结尾的字符串, len
必须为-1。
g_convert uses iconv underneath the covers. g_convert在封面下使用iconv。
On my machine using cygwim I can do 在我使用cygwim的机器上,我可以
iconv -l
which lists the supported encodings and UTF-16BE does appear in the list however:- 其中列出了受支持的编码,并且UTF-16BE确实出现在列表中:
$ iconv -l | grep BE
UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11
UCS-4BE
UTF-16BE
UTF-32BE
James@XPL3KWK28 ~
$ iconv -f UTF-8 -t UTF16-BE
iconv: conversion to UTF16-BE unsupported
iconv: try 'iconv -l' to get the list of supported encodings
as you can see it does not support the conversion to or from UTF-8. 如您所见,它不支持与UTF-8之间的转换。
You probably need to do this in two stages UTF-8 to UTF-16 then UTF-16 to UTF-16BE. 您可能需要分两个阶段执行此操作,即从UTF-8到UTF-16,然后从UTF-16到UTF-16BE。
I suspect UTF-16BE
is not supported by g_convert
(based on the error message). 我怀疑
g_convert
不支持UTF-16BE
(基于错误消息)。 It's trivial to convert UTF-8 into UTF-16BE though (no tables or other garbage like that) -- you can do that transformation yourself. 但是,将UTF-8转换为UTF-16BE并不容易(没有任何表或类似的垃圾),您可以自己进行转换。
You might also want to check if UTF-16
is supported and do your own byte swapping if necessary. 您可能还需要检查是否支持
UTF-16
,并在需要时进行自己的字节交换。 But I do not believe g_convert
supports UTF-16
either. 但是我也不相信
g_convert
支持UTF-16
。
Looks like your system does not support that conversion. 看来您的系统不支持该转换。 (This error means that iconv() returned EINVAL.)
(此错误意味着iconv()返回EINVAL。)
On my Linux system it does appear to be supported: 在我的Linux系统上,它确实受到支持:
echo "Hello" | iconv --from-code UTF-16BE --to-code UTF-8
(obviously "Hello" is not a valid UTF-16 string, but it does get converted to something, so the actual conversion seems to be supported) (显然,“ Hello”不是有效的UTF-16字符串,但确实会转换为某种形式,因此似乎支持实际的转换)
See if you have UTF-16BE in "iconv --list" 查看“ iconv --list”中是否有UTF-16BE
In this particular case your simplest solution might be to just use g_utf8_to_utf16(): http://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html#g-utf8-to-utf16 在这种情况下,最简单的解决方案可能是只使用g_utf8_to_utf16(): http ://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html#g-utf8-to-utf16
You can easily do your own byteswap, untested code: 您可以轻松地编写自己的byteswap未经测试的代码:
if (G_BYTE_ORDER != G_BIG_ENDIAN) {
for (i = 0; i < len; ++i) {
result_str[i] = GUINT16_TO_BE(result_str[i]);
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.