简体   繁体   English

适用于Linux和Windows的wchar_t之间的区别和转换

[英]Difference and conversions between wchar_t for Linux and for Windows

I understand from this and this thread that in Windows, wchar_t is 16-bit & for Linux, wchar_t is 32 bit. 我从这个这个线程中了解到,在Windows中,wchar_t是16位;对于Linux,wchar_t是32位。

I have a client-server architecture (using just pipes - not sockets)- where my server is Windows based and client is Linux. 我有一个客户端-服务器体系结构(仅使用管道而不是套接字),其中我的服务器基于Windows,客户端是Linux。

Server has a API to retrieve hostname from client. 服务器具有从客户端检索主机名的API。 When the client is Windows based, it could just do GetComputerNameW and return Wide-String. 当客户端基于Windows时,它可以执行GetComputerNameW并返回Wide-String。 However, when the client is Linux based, things get messy. 但是,当客户端基于Linux时,情况就会变得混乱。

As a first naive approach, I used mbstowcs() hoping to return wchar_t* to Windows server-side. 作为第一个幼稚的方法,我使用mbstowcs()希望将wchar_t *返回到Windows服务器端。 However, this LPWSTR (I have typedef wchar_t* LPWSTR on my linux clinet side) is not recognizable on Windows since it expects its wchar_t to be 16-bit. 但是,此LPWSTR(我在Linux clinet端具有typedef wchar_t * LPWSTR)在Windows上无法识别,因为它希望其wchar_t为16位。

So, converting the output of gethostname() on linux - which is in char* to unsigned short (16-bit) my only option? 因此,在Linux上将gethostname()的输出转换为char *到unsigned short(16位)是我唯一的选择吗?

Thanks in Advance! 提前致谢!

You will have to decide on the actual protocol on how to transport the data across the wire. 您将必须确定有关如何跨网络传输数据的实际协议。 Several options here although probably UTF-8 is usually the most sensible one - also that means that under linux you can basically just use the data as-is (no reason to use wchar_t to begin with, although you obviously can convert it into whatever you want). 这里有几个选项,尽管通常UTF-8通常是最明智的选择-这也意味着在Linux下,您基本上可以按原样使用数据(没有理由使用wchar_t开头,尽管您显然可以将其转换为任何形式想)。

Under Windows you will have to convert the UTF-8 into UTF-16 (yes not exactly, but oh well) which windows wants and if you want to send data you have to convert it to UTF-8. 在Windows下,您必须将UTF-8转换为Windows想要的UTF-16(是的,但不是很好),如果您要发送数据,则必须将其转换为UTF-8。 Luckily windows provides this respectively this function for exactly these purposes. 幸运的是,windows正是出于这些目的分别为此提供了功能。

Obviously you can decide on any encoding you want to not necessarily UTF-8, the process is the same: When receiving data convert it to the native format of the OS, when sending convert it to your on-wire encoding. 显然,您可以决定是否要使用任何编码(不一定是UTF-8),过程是相同的:接收数据时,将其转换为OS的本机格式,发送数据时将其转换为在线编码。 iconv works on linux if you don't use utf-8. 如果您不使用utf-8, iconv可以在Linux上运行。

You are best off choosing a standard character encoding for the data you send over the pipe, and then require all machines to send their data using that encoding. 最好为通过管道发送的数据选择标准字符编码,然后要求所有计算机使用该编码发送数据。

Windows uses UTF-16LE, so you could choose to use UTF-16LE over the pipe and then Windows machines can send their UTF-16LE encoded strings as-is, but Linux machines would have to convert to/from UTF-16LE as needed. Windows使用UTF-16LE,因此您可以选择在管道上使用UTF-16LE,然后Windows机器可以按原样发送其UTF-16LE编码的字符串,但是Linux机器必须根据需要转换为UTF-16LE。

Or you could choose UTF-8 instead, which would reduce network bandwidth, but both Windows and Linux machines would have to convert to/from UTF-8 as neded. 或者,您也可以选择UTF-8,这会减少网络带宽,但是Windows和Linux计算机都必须按照需要与UTF-8进行相互转换。 For network communications, UTF-8 would be the better choice. 对于网络通信,UTF-8是更好的选择。

On Windows, you can use MultiByteToWideChar() and WideCharToMultiByte() with the CP_UTF8 codepage. 在Windows上,可以将MultiByteToWideChar()WideCharToMultiByte()CP_UTF8代码页一起使用。

In Linux, use the iconv() API so you can specify the UTF-8 charset for encoding/decoding. 在Linux中,使用iconv() API,以便您可以指定用于编码/解码的UTF-8字符集。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM