简体繁体 English

为什么用short 代替char 来表示字符串？ char 和unsigned char 之间的区别？

[英]Why short* instead of char* for string? Difference between char* and unsigned char*?

原文 2012-02-15 14:39:51 2 3 c/ character-encoding/ char/ unsigned/ short

As the title says, I'm having two questions. 如标题所述，我有两个问题。

Edit : To clarify, they don't actually use char and short , they ensure them to be 8-bit and 16-bit by specific typedefs. 编辑：要澄清一下，它们实际上并没有使用char和short ，而是通过特定的typedef确保它们是8位和16位。 The actual type is then called UInt8 and UInt16 . 然后将实际类型称为UInt8和UInt16 。

1. Question 1.问题

The iTunes SDK uses unsigned short* where a string is needed. iTunes SDK在需要字符串的地方使用unsigned short* 。 What are the advantages of using it instead of char* / unsigned char* ? 用它代替char* / unsigned char*什么好处？ How to convert it to char* , and what differs when working with this type instead? 如何将其转换为char* ，而使用此类型时有何不同？

2. Question 2.问题

I've only seen char* when a string must be stored, yet. 我只在必须存储字符串时才看到char* 。 When should I use unsigned char* then, or doesn't it make any difference? 那我什么时候应该使用unsigned char*呢？

3 个解决方案

unsigned short arrays can be used with wide character strings - for instance if you have UTF-16 encoded texts - although I'd expect to see wchar_t in those cases. unsigned short数组可以与宽字符串一起使用-例如，如果您具有UTF-16编码的文本-尽管我希望在这种情况下可以看到wchar_t 。 But they may have their reasons, like being compatible between MacOS and Windows. 但是它们可能有其原因，例如在MacOS和Windows之间兼容。 (If my sources are right, MacOS' wchar_t is 32 bits, while Windows' is 16 bits.) （如果我的资料正确，MacOS的wchar_t是32位，而Windows是16位。）

You convert between the two types of string by calling the appropriate library function. 您可以通过调用相应的库函数在两种类型的字符串之间进行转换。 Which function is appropriate depends on the situation. 哪种功能合适取决于情况。 Doesn't the SDK come with one? SDK不附带一个吗？

And char instead of unsigned char , well, all strings have historically always been defined with char , so switching to unsigned char would introduce incompatibilities. 用char代替unsigned char ，好吧，所有字符串在历史上总是用char定义的，因此切换到unsigned char会引入不兼容性。
(Switching to signed char would also cause incompatibilities, but somehow not as many...) （切换到带signed char也会导致不兼容，但是不知何故...）

Edit Now the question has been edited, let me say that I didn't see the edits before I typed my answer. 编辑现在，问题已被编辑，我可以说在输入答案之前没有看到编辑内容。 But yes, UInt16 is a better representation of a 16 bit entity than wchar_t for the above reason. 但是是的，由于上述原因， UInt16比wchar_t更能表示16位实体。

1. Question - Answer 1.问题-答案

I would suppose that they use unsigned short* because they must be utilizing UTF-16 encoding for unicode characters and hence representing characters both in and out of the BMP. 我想他们会使用无符号short *，因为它们必须对Unicode字符使用UTF-16编码，从而代表BMP内外的字符。 The rest of your question depends on the type of Unicode encoding of the source and the destination (UTF-8,16,32) 您剩下的问题取决于源和目标的Unicode编码类型（UTF-8,16,32）

2. Question - Answer 2.问题-答案

Again depends on the type of encoding and what strings are you talking about. 再次取决于编码的类型以及您在说什么字符串。 You should never used signed or unsigned characters if you plan to deal with strings of characters outside of the Extended ASCII table. 如果打算处理扩展ASCII表之外的字符串，则永远不要使用带符号或无符号字符。 (Any other language except from English) （除英语以外的任何其他语言）

Probably a harebrained attempt to use UTF-16 strings. 可能是使用UTF-16字符串的繁琐尝试。 C has a wide character type, wchar_t and its char s (or wchar_t s) can be 16 bits long. C具有宽字符类型， wchar_t及其char （或wchar_t ）的长度可以为16位。 Though I'm not familiar enough with the SDK to say why exactly they went through this route, it's probably to work around compiler issues. 尽管我对SDK不够熟悉，无法说明为什么他们会严格按照这种方法进行操作，但它可能可以解决编译器问题。 In C99 there are much more suitable [u]int[least/fast]16_t types - see <stdint.h> . 在C99中，有更合适的[u] int [least / fast] 16_t类型-参见<stdint.h> 。
Note that C makes very little guarantees about data types and their underlying sizes. 请注意，C对数据类型及其底层大小几乎没有保证。 Signed or unsigned shorts aren't guaranteed to be 16 bits (though they are guaranteed to be at least that much), nor are chars restricted to 8 or widechars 16 or 32. 带符号或无符号的短裤不能保证为16位（尽管保证至少是那么多），也不限制为8个字符或16个或32个宽字符。
To convert between char and short strings, you'd use the conversion functions provided by the SDK. 要在char和短字符串之间进行转换，可以使用SDK提供的转换函数。 You could also write your own or use a 3rd party library, if you knew exactly what they stored in those short strings AND what you wanted in your char strings. 如果您确切知道短字符串中存储的内容以及字符字符串中想要的内容，则也可以编写自己的文件或使用第三方库。
It doesn't really make a difference. 它并没有真正的改变。 You'd normally convert to unsigned char if you wanted to do (unsigned) arithmetic or bit manipulation on a character. 如果要对unsigned char执行（无符号）算术或位操作，通常会转换为unsigned char 。

Edit: I wrote (or started writing, anyhow) this answer before you told us they used UInt16 and not unsigned short. 编辑：在您告诉我们他们使用UInt16而不是短签名之前，我写了（或开始写，无论如何）这个答案。 In that case there are no hare brains involved; 在这种情况下，不涉及野兔大脑。 the proprietary type is probably used for compatibility with older (or noncompliant) compilers which don't have the stdint types, to store UTF-16 data. 专有类型可能用于与没有stdint类型的较旧（或不兼容）编译器兼容，以存储UTF-16数据。 Which is perfectly reasonable. 这是完全合理的。