简体繁体 English

适用于Linux和Windows的wchar_t之间的区别和转换

[英]Difference and conversions between wchar_t for Linux and for Windows

原文 2012-11-27 20:37:15 1 2 c++/ visual-c++/ g++/ wchar

I understand from this and this thread that in Windows, wchar_t is 16-bit & for Linux, wchar_t is 32 bit. 我从这个和这个线程中了解到，在Windows中，wchar_t是16位；对于Linux，wchar_t是32位。

I have a client-server architecture (using just pipes - not sockets)- where my server is Windows based and client is Linux. 我有一个客户端-服务器体系结构（仅使用管道而不是套接字），其中我的服务器基于Windows，客户端是Linux。

Server has a API to retrieve hostname from client. 服务器具有从客户端检索主机名的API。 When the client is Windows based, it could just do GetComputerNameW and return Wide-String. 当客户端基于Windows时，它可以执行GetComputerNameW并返回Wide-String。 However, when the client is Linux based, things get messy. 但是，当客户端基于Linux时，情况就会变得混乱。

As a first naive approach, I used mbstowcs() hoping to return wchar_t* to Windows server-side. 作为第一个幼稚的方法，我使用mbstowcs（）希望将wchar_t *返回到Windows服务器端。 However, this LPWSTR (I have typedef wchar_t* LPWSTR on my linux clinet side) is not recognizable on Windows since it expects its wchar_t to be 16-bit. 但是，此LPWSTR（我在Linux clinet端具有typedef wchar_t * LPWSTR）在Windows上无法识别，因为它希望其wchar_t为16位。

So, converting the output of gethostname() on linux - which is in char* to unsigned short (16-bit) my only option? 因此，在Linux上将gethostname（）的输出转换为char *到unsigned short（16位）是我唯一的选择吗？

Thanks in Advance! 提前致谢！

2 个解决方案

You will have to decide on the actual protocol on how to transport the data across the wire. 您将必须确定有关如何跨网络传输数据的实际协议。 Several options here although probably UTF-8 is usually the most sensible one - also that means that under linux you can basically just use the data as-is (no reason to use wchar_t to begin with, although you obviously can convert it into whatever you want). 这里有几个选项，尽管通常UTF-8通常是最明智的选择-这也意味着在Linux下，您基本上可以按原样使用数据（没有理由使用wchar_t开头，尽管您显然可以将其转换为任何形式想）。

Under Windows you will have to convert the UTF-8 into UTF-16 (yes not exactly, but oh well) which windows wants and if you want to send data you have to convert it to UTF-8. 在Windows下，您必须将UTF-8转换为Windows想要的UTF-16（是的，但不是很好），如果您要发送数据，则必须将其转换为UTF-8。 Luckily windows provides this respectively this function for exactly these purposes. 幸运的是，windows正是出于这些目的分别为此提供了此功能。

Obviously you can decide on any encoding you want to not necessarily UTF-8, the process is the same: When receiving data convert it to the native format of the OS, when sending convert it to your on-wire encoding. 显然，您可以决定是否要使用任何编码（不一定是UTF-8），过程是相同的：接收数据时，将其转换为OS的本机格式，发送数据时将其转换为在线编码。 iconv works on linux if you don't use utf-8. 如果您不使用utf-8， iconv可以在Linux上运行。

You are best off choosing a standard character encoding for the data you send over the pipe, and then require all machines to send their data using that encoding. 最好为通过管道发送的数据选择标准字符编码，然后要求所有计算机使用该编码发送数据。

Windows uses UTF-16LE, so you could choose to use UTF-16LE over the pipe and then Windows machines can send their UTF-16LE encoded strings as-is, but Linux machines would have to convert to/from UTF-16LE as needed. Windows使用UTF-16LE，因此您可以选择在管道上使用UTF-16LE，然后Windows机器可以按原样发送其UTF-16LE编码的字符串，但是Linux机器必须根据需要转换为UTF-16LE。

Or you could choose UTF-8 instead, which would reduce network bandwidth, but both Windows and Linux machines would have to convert to/from UTF-8 as neded. 或者，您也可以选择UTF-8，这会减少网络带宽，但是Windows和Linux计算机都必须按照需要与UTF-8进行相互转换。 For network communications, UTF-8 would be the better choice. 对于网络通信，UTF-8是更好的选择。

On Windows, you can use MultiByteToWideChar() and WideCharToMultiByte() with the CP_UTF8 codepage. 在Windows上，可以将MultiByteToWideChar()和WideCharToMultiByte()与CP_UTF8代码页一起使用。

In Linux, use the iconv() API so you can specify the UTF-8 charset for encoding/decoding. 在Linux中，使用iconv() API，以便您可以指定用于编码/解码的UTF-8字符集。