简体   繁体   English

Unicode字符转换为wstring

[英]Unicode char to wstring

I'm trying to send a C# string to a C++ wstring data and vice vera. 我正在尝试将C#字符串发送到C ++ wstring数据,反之亦然。 (by TCP). (通过TCP)。

I succeeded at sending string data from C#(as Unicode, UTF-16) and got it into at C++ by char array. 我成功地从C#发送了字符串数据(作为Unicode,UTF-16),并通过char数组将其放入C ++。

But I have no idea how to convert char the array to a wstring. 但是我不知道如何将char数组转换为wstring。

This is what it looks like when c++ gets "abcd" with utf-16 这是C ++用utf-16获取“ abcd”时的样子

    [0] 97 'a'  char
    [1] 0 '\0'  char
    [2] 98 'b'  char
    [3] 0 '\0'  char
    [4] 99 'c'  char
    [5] 0 '\0'  char
    [6] 100 'd' char
    [7] 0 '\0'  char

this is what it looks like when c++ gets "한글" with utf-16 这是C ++用utf-16取“한글”时的样子

    [0] 92 '\\' char
    [1] -43 '?' char
    [2] 0 '\0'  char
    [3] -82 '?' char

and this is what it looks like when c++ gets "日本語" with utf-16 这就是c ++用utf-16获取“日本语”时的样子

    [0] -27 '?' char
    [1] 101 'e' char
    [2] 44 ','  char
    [3] 103 'g' char
    [4] -98 '?' char
    [5] -118 '?'char

Since UTF-8 doesn't support all Japanese character, I tried to get data via UTF-16 (which C# string basically used). 由于UTF-8不支持所有日语字符,因此我尝试通过UTF-16(基本上使用C#字符串)获取数据。 But I failed to convert these char arrays to wstring by using every way that I have found. 但是我无法通过使用发现的每种方法将这些char数组转换为wstring。

This is what I tried before 这是我以前尝试过的

std::wstring_convert<std::codecvt_utf16<wchar_t>> myconv 
 -> what wchar have to have
        [0] 54620 '한'   wchar_t
        [1] 44544 '글'   wchar_t
 ->What it have after using this 
    [0] 23765 '峕'   wchar_t
    [1] 174 '®' wchar_t

/ /

std::wstring wsTmp(s.begin(), s.end()); 

 -> what wchar have to have
            [0] 54620 '한'   wchar_t
            [1] 44544 '글'   wchar_t

->What it have after using this 
        [0] 92 '\\' wchar_t
        [1] 65493 'ᅰ'   wchar_t
        [2] 0 '\0'  wchar_t
        [3] 65454 'ᆴ'   wchar_t

In both of them, I change char the array to a string and change it to a wstring and that failed...... 在他们两个人中,我都将char数组更改为字符串,然后将其更改为wstring,但是失败了……

Does anyone have any idea how to convert non-English UTF-16 char data to wstring data? 有谁知道如何将非英语UTF-16字符数据转换为wstring数据?

Add : C# side code 添加:C#边码

byte[] sendBuffer = Encoding.Unicode.GetBytes(Console.ReadLine());
clientSocket.Send(sendBuffer);

and it convert '한글' into byte like 然后将“한글”转换为字节

    [0] 92  byte
    [1] 213 byte
    [2] 0   byte
    [3] 174 byte

I try to send C# string data to C++ wstring data and vice vera. 我尝试将C#字符串数据发送到C ++ wstring数据,反之亦然。 (by TCP) (通过TCP)

I succesed to send string data from C#(as Unicode, UTF-16) and get it at C++ by char array. 我成功地从C#发送了字符串数据(如Unicode,UTF-16),并通过char数组在C ++中获得了它。

It would be better, and more portable, to transmit the data using UTF-8 instead of UTF-16. 使用UTF-8而不是UTF-16传输数据会更好,更便于携带。

But I have no idea how to convert char array to wstring. 但是我不知道如何将char数组转换为wstring。

On platforms where wchar_t is 16bit, such as Windows (which I presume you are on, as you are using C#), you can copy your char array content as-is directly into a std::wstring , eg: wchar_t是16位的平台上,例如Windows(我假设您使用的是C#),您可以按原样直接将char数组内容复制到std::wstring ,例如:

char *buffer = ...;
int buflen = ...;

std::wstring wstr(reinterpret_cast<wchar_t*>(buffer), buflen / sizeof(wchar_t));

If you need to support platforms where wchar_t is 32bit instead, you can use std::wstring_convert : 如果您需要支持wchar_t是32位的平台,则可以使用std::wstring_convert

char *buffer = ...;
int buflen = ...;

std::wstring_convert<std::codecvt_utf16<wchar_t>, wchar_t> conv;
std::wstring wstr = conv.from_bytes(std::string(buffer, buflen));
// or:
// std::wstring wstr = conv.from_bytes(buffer, buffer+buflen);

Since wchar_t is not very portable, consider using std::u16string / char16_t instead (if you are using a compiler that supports C++11 or later, that is), as they were designed specifically for UTF-16 data. 由于wchar_t不太便于移植,请考虑改用std::u16string / char16_t (如果使用的是支持C ++ 11或更高版本的编译器),因为它们是专为UTF-16数据设计的。

Since UTF-8 dosen't support all japanese character 由于UTF-8不支持所有日语字符

Yes, it does. 是的,它确实。 Unicode is the actual character set, UTFs are just encodings for representing Unicode codepoints as byte sequences. Unicode是实际的字符集,UTF只是用于将Unicode代码点表示为字节序列的编码。 ALL UTFs (UTF-7, UTF-8, UTF-16, and UTF-32) support the ENTIRE Unicode character set, and UTFs are designed to allow for loss-less conversion from one UTF to another. 所有 UTF(UTF-7,UTF-8,UTF-16和UTF-32)都支持整个 Unicode字符集,并且UTF旨在实现从一种UTF到另一种UTF的无损转换。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM