简体   繁体   English

如何从 Little-Endian UTF-16 编码字节获取 C++ std::string

[英]How to get C++ std::string from Little-Endian UTF-16 encoded bytes

I have a 3rd party device that communicates with my Linux box over a proprietary communication protocol that isn't well documented.我有一个第 3 方设备,它通过没有很好记录的专有通信协议与我的 Linux 盒子通信。 Some packets convey "strings" that, after reading this Joel On Software article , appears to be in UTF16 Little-Endian encoding.一些数据包传送“字符串”,在阅读了这篇 Joel On Software 文章后,似乎采用 UTF16 Little-Endian 编码。 In other words, what I have on my Linux box after receipt of such packets are things like换句话说,在收到此类数据包后,我的 Linux 盒子上的内容类似于

// The string "Out"
unsigned char data1[] = {0x4f, 0x00, 0x75, 0x00, 0x74, 0x00, 0x00, 0x00};

// The string "°F"
unsigned char data2[] = {0xb0, 0x00, 0x46, 0x00, 0x00, 0x00};

As I understand it, I cannot treat these as an std::wstring because on Linux a wchar_t is 4 bytes.据我了解,我不能将它们视为std::wstring因为在 Linux 上wchar_t是 4 个字节。 I do, however, have one thing going for me in that my Linux box is also Little-Endian.但是,我确实有一件事情对我有用,因为我的 Linux 盒子也是 Little-Endian。 So, I believe I need to use something like std::codecvt_utf8_utf16<char16_t> .所以,我相信我需要使用类似std::codecvt_utf8_utf16<char16_t>的东西。 However, even after reading the documentation , I cannot figure out how to actually go from an unsigned char[] to an std::string .但是,即使在阅读文档之后,我也无法弄清楚如何将 go 从unsigned char[]实际转换为std::string Can someone please help?有人可以帮忙吗?

If you wish to use std::codcvt (which is deprecated since C++ 17) you can wrap your UTF-16 text, and then convert it to UTF-8, if needed.如果您希望使用 std::codcvt(自 C++ 17 起已弃用),您可以包装 UTF-16 文本,然后在需要时将其转换为 UTF-8。

ie IE

// simply cast raw data for constructor, since we known that char 
// is actually 'byte' array from network API
std::u16string u16_str( reinterpret_cast<const char16_t*>(data2) );

// UTF-16/char16_t to UTF-8
std::string u8_conv = std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t>{}.to_bytes(u16_str);

For the sake of completeness, here's the simplest iconv based conversion I came up with为了完整起见,这是我想出的最简单的基于iconv的转换

#include <iconv.h>

auto iconv_eng = ::iconv_open("UTF-8", "UTF-16LE");
if (reinterpret_cast<::iconv_t>(-1) == iconv_eng)
{
  std::cerr << "Unable to create ICONV engine: " << strerror(errno) << std::endl;
}
else
{
  // src            a char * to utf16 bytes
  // src_size       the maximum number of bytes to convert
  // dest           a char * to utf8 bytes to generate
  // dest_size      the maximum number of bytes to write
  if (static_cast<std::size_t>(-1) == ::iconv(iconv_eng, &src, &src_size, &dest, &dest_size))
  {
    std::cerr << "Unable to convert from UTF16: " << strerror(errno) << std::endl;
  }
  else
  {
    std::string utf8_str(src);
    ::iconv_close(iconv_eng);
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从UTF-8到UTF-16大字节序的字符串转换失败(使用C,C ++语言) - String conversion from UTF-8 to UTF-16 Big endian is failing (using C, C++ language) 如何将包含utf-16编码文本的std :: string转换为utf-16 wstring? - How to turn std::string that contains utf-16 encoded text in it into utf-16 wstring? 如何在 C++ 中将 Big/Little-Endian 字节转换为 Integer 和反之亦然 - How to Convert Big/Little-Endian bytes to Integer and vice versa in C++ 如何将UTF-8编码的std :: string转换为UTF-16 std :: string - How to convert UTF-8 encoded std::string to UTF-16 std::string 将Big-Endian结构转换为Little-Endian C ++ - Convert Big-Endian struct to Little-Endian C++ 在Azure中将BlobId的int转换为C ++中的低字节序格式的字节 - Convert int to little-endian formated bytes in C++ for blobId in Azure 如何在 C++ 中的大端和小端值之间进行转换? - How do I convert between big-endian and little-endian values in C++? C ++对MBCS使用std :: string函数,对UTF-16使用std :: wstring函数 - C++ Using std::string functions for MBCS and std::wstring functions for UTF-16 如何使用C ++将文件从Windows utf-16或Windows utf-8转换为Unix utf-16 - How to convert file from windows utf-16 or windows utf-8 to unix utf-16 with C++ C ++ wcout utf-16 编码的字符数组如何? - how can C++ wcout utf-16 encoded char array?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM