简体   繁体   中英

How to get C++ std::string from Little-Endian UTF-16 encoded bytes

I have a 3rd party device that communicates with my Linux box over a proprietary communication protocol that isn't well documented. Some packets convey "strings" that, after reading this Joel On Software article , appears to be in UTF16 Little-Endian encoding. In other words, what I have on my Linux box after receipt of such packets are things like

// The string "Out"
unsigned char data1[] = {0x4f, 0x00, 0x75, 0x00, 0x74, 0x00, 0x00, 0x00};

// The string "°F"
unsigned char data2[] = {0xb0, 0x00, 0x46, 0x00, 0x00, 0x00};

As I understand it, I cannot treat these as an std::wstring because on Linux a wchar_t is 4 bytes. I do, however, have one thing going for me in that my Linux box is also Little-Endian. So, I believe I need to use something like std::codecvt_utf8_utf16<char16_t> . However, even after reading the documentation , I cannot figure out how to actually go from an unsigned char[] to an std::string . Can someone please help?

If you wish to use std::codcvt (which is deprecated since C++ 17) you can wrap your UTF-16 text, and then convert it to UTF-8, if needed.

ie

// simply cast raw data for constructor, since we known that char 
// is actually 'byte' array from network API
std::u16string u16_str( reinterpret_cast<const char16_t*>(data2) );

// UTF-16/char16_t to UTF-8
std::string u8_conv = std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t>{}.to_bytes(u16_str);

For the sake of completeness, here's the simplest iconv based conversion I came up with

#include <iconv.h>

auto iconv_eng = ::iconv_open("UTF-8", "UTF-16LE");
if (reinterpret_cast<::iconv_t>(-1) == iconv_eng)
{
  std::cerr << "Unable to create ICONV engine: " << strerror(errno) << std::endl;
}
else
{
  // src            a char * to utf16 bytes
  // src_size       the maximum number of bytes to convert
  // dest           a char * to utf8 bytes to generate
  // dest_size      the maximum number of bytes to write
  if (static_cast<std::size_t>(-1) == ::iconv(iconv_eng, &src, &src_size, &dest, &dest_size))
  {
    std::cerr << "Unable to convert from UTF16: " << strerror(errno) << std::endl;
  }
  else
  {
    std::string utf8_str(src);
    ::iconv_close(iconv_eng);
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM