简体繁体 English

使size_t和wchar_t可移植？

[英]Making size_t and wchar_t portable?

原文 2010-10-06 21:30:15 5 5 c++/ c/ size-t

To my understanding the representation of size_t and wchar_t are completely platform/compiler specific. 据我所知，size_t和wchar_t的表示完全是特定于平台/编译器的。 For instance I have read that wchar_t on Linux is now usually 32bit, but on Windows it is 16bit. 例如，我已经读过Linux上的wchar_t现在通常是32位，但在Windows上它是16位。 Is there any way that I can standardize these to a set size (int, long, etc.) in my own code, while still maintaining backwards comparability with the existing standard C libraries and functions on both platforms? 有没有什么方法可以在我自己的代码中将这些标准化为一组大小（int，long等），同时仍然保持与两个平台上现有标准C库和函数的向后可比性？

My goal is essentially to do something like typedef them so they are a set size. 我的目标基本上是做类似的设置，因此它们是一组尺寸。 Is this possible without breaking something? 这可能不会破坏某些东西吗？ Should I do this? 我应该这样做吗？ Is there a better way? 有没有更好的办法？

UPDATE: The reason I'd like to do this is so that my string encoding is consistent across both Windows and Linux 更新：我想这样做的原因是我的字符串编码在Windows和Linux上都是一致的

Thanks! 谢谢！

5 个解决方案

Sounds like you're looking for C99's & C++0x's <stdint.h> / <cstdint> headers. 听起来你正在寻找C99和C ++ 0x的<stdint.h> / <cstdint>标题。 This defines types like uint8_t , and int64_t . 这定义了像uint8_t和int64_t这样的类型。

You can use Boost's cstdint.hpp in the case you don't have those headers. 如果没有这些标题，可以使用Boost的cstdint.hpp 。

You don't want to redefine those types. 您不想重新定义这些类型。 Instead, you can use typedefs like int32_t or int16_t (signed 32-bit and 16-bit), which are part of <stdint.h> in the C standard library. 相反，您可以使用typedef，如int32_t或int16_t （带符号的32位和16位），它们是C标准库中<stdint.h>一部分。

If you're using C++, C++0x will add char16_t and char32_t , which are new types (not just typedefs for integral types) intended for UTF-16 and UTF-32. 如果你正在使用C ++，C ++ 0x将添加char16_t和char32_t ，它们是用于UTF-16和UTF-32的新类型（不仅仅是用于整数类型的typedef）。

For wchar_t , an alternative is to just use a library like ICU which implements Unicode in a platform-independent way. 对于wchar_t ，另一种方法是使用像ICU这样的库，它以独立于平台的方式实现Unicode。 Then, you can just use the UChar type, which will always be UTF-16; 然后，您可以使用UChar类型，它始终是UTF-16; you do still need to be careful about endianness. 你仍然需要注意字节序。 ICU also provides converters to and from UChar (UTF-16). ICU还提供往返于UChar（UTF-16）的转换器。

No. The fundemental problem with trying to use a typedef to "fix" a character type, is that you end up with something that on some platforms is consistent with the built in functions and with wide character literals, and on other platforms is not. 没有。尝试使用typedef“修复”字符类型的一个基本问题是，你最终得到的东西在某些平台上与内置函数一致并且具有宽字符文字，而在其他平台上则不然。

If you want a string format which is the same on all platforms, you could just pick a size and signed-ness. 如果你想要一个在所有平台上都相同的字符串格式，你可以选择一个大小和签名。 You want unsigned 8 bit "characters", or signed 64 bit "characters"? 你想要无符号8位“字符”，或签名64位“字符”？ You can have them on any platform which has an integer type of the appropriate size (not all do). 您可以在任何具有适当大小的整数类型的平台上（不是全部都可以）使用它们。 But, they're not really characters as far as the language is concerned, so don't expect to be able to call strlen or wcslen on them, or to have a nice syntax for literals. 但是，就语言而言，它们并不是真正的字符，所以不要期望能够在它们上面调用strlen或wcslen ，或者为文字提供一个很好的语法。 A string literal is (well, converts to) a char* , not a signed char* or an unsigned char* . 字符串文字是（好吧，转换为） char* ，而不是signed char*或unsigned char* 。 A wide string literal is a wchar_t* , which is equivalent to some other integer type, but not necessarily the one you want it to be. 宽字符串文字是wchar_t* ，它等同于其他一些整数类型，但不一定是你想要的那个。

So, you have to pick an encoding, use that internally, define your own versions of the string functions you need, implement them, then convert to/from the platform's encoding as necessary for non-string functions that take strings. 因此，您必须选择一个编码，在内部使用它，定义您需要的字符串函数的自己的版本，实现它们，然后根据需要转换为平台的编码，从而转换为带有字符串的非字符串函数。 utf-8 is a decent option because most of the C string functions still "work", in the sense that they do something fairly useful even if it isn't entirely correct. utf-8是一个不错的选择，因为大多数C字符串函数仍然“工作”，即使它们不完全正确，它们也会做一些非常有用的事情。

wchar_t is going to be a stickier wicket, possibly, than size_t. wchar_t可能比size_t更加坚固。 One could assume a maximum size for size_t (8 bytes say) and cast all variables to that before writing to file (or socket). 可以假设size_t的最大大小（比如说8个字节）并在写入文件（或套接字）之前将所有变量强制转换为该变量。 One other thing to keep in mind is that you are going to have byte ordering issues if you are trying to write/read some sort of binary representation. 另外要记住的一件事是，如果您尝试编写/读取某种二进制表示，则会出现字节排序问题。 Anyway, wchar_t may represent a utf-32 encoding on one system (I believe that Linux does this) and could represent a UTF-16 encoding on another system (windows does this). 无论如何，wchar_t可能代表一个系统上的utf-32编码（我相信Linux会这样做）并且可能代表另一个系统上的UTF-16编码（Windows会这样做）。 If you are trying to create a standard format between platforms, you are going to have to resolve all of these issues. 如果您尝试在平台之间创建标准格式，则必须解决所有这些问题。

Just work with UTF-8 internally, and convert to UTF-16 just-in-time when passing arguments to Windows functions that require it. 只需在内部使用UTF-8，并在将参数传递给需要它的Windows函数时及时转换为UTF-16。 UTF-32 is probably never needed. 可能永远不需要UTF-32。 Since it's usually wrong (in a Unicode sense) to process individual characters instead of strings, it's no more difficult to work with capitalizing or normalizing a UTF-8 string than it is a UTF-32 string. 由于处理单个字符而不是字符串通常是错误的（在Unicode意义上），因此使用UTF-8字符串大写或规范化并不比使用UTF-32字符串更困难。