简体繁体 English

C#中char类型的大小

[英]size of char type in c#

原文 2010-01-25 17:02:45 5 6 c#/ .net/ character-encoding

只是想知道为什么我们在 C# (.NET) 中有 2 个字节大小的char类型，而不像其他编程语言中的 1 个字节？

6 个解决方案

A char is unicode in C#, therefore the number of possible characters exceeds 255. So you'll need two bytes. char 在 C# 中是 unicode，因此可能的字符数超过 255。所以你需要两个字节。

Extended ASCII for example has a 255-char set, and can therefore be stored in one single byte.例如，扩展 ASCII 有 255 个字符集，因此可以存储在单个字节中。 That's also the whole purpose of the System.Text.Encoding namespace, as different systems can have different charsets, and char sizes.这也是System.Text.Encoding命名空间的全部目的，因为不同的系统可以有不同的字符集和字符大小。 C# can therefore handle one/four/etc.因此，C# 可以处理一/四等。 char bytes, but Unicode UTF-16 is default. char 字节，但 Unicode UTF-16 是默认值。

I'm guessing with “other programming languages” you mean C. C has actually two different char types: char and wchar_t .我猜“其他编程语言”是指 C。C 实际上有两种不同的char类型： char和wchar_t 。 char may be one byte long, wchar_t not necessarily. char可能是一个字节长， wchar_t不一定。

In C# (and .NET) for that matter, all character strings are encoded as Unicode in UTF-16.在 C#（和 .NET）中，所有字符串都被编码为 UTF-16 中的 Unicode。 That's why a char in .NET represents a single UTF-16 code unit which may be a code point or half of a surrogate pair (not actually a character, then).这就是为什么 .NET 中的char代表单个 UTF-16代码单元的原因，它可能是一个代码点或代理对的一半（那么实际上不是一个字符）。

Actually C#, or more accurately the CLR's, size of char is consistent with most other managed languages.实际上，C#，或者更准确地说是 CLR，char 的大小与大多数其他托管语言一致。 Managed languages, like Java, tend to be newer and have items like unicode support built in from the ground up.托管语言（如 Java）往往更新，并且具有从头开始内置的 unicode 支持等项目。 The natural extension of supporting unicode strings is to have unicode char's.支持 unicode 字符串的自然扩展是拥有 unicode 字符。

Older languages like C/C++ started in ASCII only and only later added unicode support. C/C++ 等较旧的语言仅以 ASCII 开始，后来才添加了 unicode 支持。

因为 .NET 中的字符串被编码为 2 字节 Unicode 字符。

因为 C# 字符串中的字符默认为 Unicode 的 UTF-16 编码，即 2 个字节（默认情况下）。

C# using 16 bit character width probably has more to do with performance rather than anything else.使用 16 位字符宽度的 C# 可能更多地与性能有关，而不是其他任何事情。

Firstly if you use UTF-8 you can fit every character in the "right" amount of space.首先，如果您使用 UTF-8，您可以在“正确”的空间量中容纳每个字符。 This is because UTF-8 is variable width.这是因为 UTF-8 是可变宽度的。 ASCII chars will use 8 bits while larger characters will use more. ASCII 字符将使用 8 位，而较大的字符将使用更多。

But variable length character encoding encourages a O(n) algorithm complexity in common scenarios.但是可变长度字符编码在常见场景中鼓励O(n)算法复杂度。 Eg Retrieving a character at a particular location in a string.例如，检索字符串中特定位置的字符。 There have been public discussions on this point.关于这一点已经有公开的讨论。 But the simplest solution is to continue using a character width that fits most of your charset, truncating the others.但最简单的解决方案是继续使用适合您的大部分字符集的字符宽度，截断其他字符集。 Now you have a fixed character width.现在你有一个固定的字符宽度。

Strictly speaking, UTF-16 is also a variable width encoding, so C# ( and Java for that matter ) are using something of a hybrid since their character widths are never 32 bits.严格来说，UTF-16 也是一种可变宽度编码，因此 C#（和 Java）正在使用混合的东西，因为它们的字符宽度永远不会是 32 位。