[英]C Language: Why int variable can store char?
I am recently reading The C Programming Language by Kernighan. 我最近正在阅读Kernighan的C编程语言。
There is an example which defined a variable as int type but using getchar()
to store in it. 有一个例子将变量定义为int类型,但使用
getchar()
存储在其中。
int x;
x = getchar();
Why we can store a char
data as a int
variable? 为什么我们可以将
char
数据存储为int
变量? The only thing that I can think about is ASCII and UNICODE. 我唯一能想到的就是ASCII和UNICODE。 Am I right?
我对吗?
The getchar
function (and similar character input functions) returns an int
because of EOF
. 由于
EOF
, getchar
函数(和类似的字符输入函数)返回一个int
。 There are cases when (char) EOF != EOF
(like when char
is an unsigned
type). 有些情况下
(char) EOF != EOF
(就像char
是unsigned
类型时)。
Also, in many places where one use a char
variable, it will silently be promoted to int
anyway. 此外,在许多使用
char
变量的地方,无论如何都会无声地将其提升为int
。 Ant that includes constant character literals like 'A'
. 包含常量字符文字的Ant,如
'A'
。
getchar
is an old C standard function and the philosophy back then was closer to how the language gets translated to assembly than type correctness and readability. getchar
是一个古老的C标准函数,当时的哲学更接近于语言如何转换为汇编而不是类型的正确性和可读性。 Keep in mind that compilers were not optimizing code as much as they are today. 请记住,编译器并没有像现在这样优化代码。 In C,
int
is the default return type (ie if you don't have a declaration of a function in C, compilers will assume that it returns int
), and returning a value is done using a register - therefore returning a char
instead of an int
actually generates additional implicit code to mask out the extra bytes of your value. 在C中,
int
是默认的返回类型(即如果你没有C语言中的函数声明,编译器将假定它返回int
),并且使用寄存器返回一个值 - 因此返回一个char
而不是一个int
实际上生成了额外的隐式代码来掩盖你的值的额外字节。 Thus, many old C functions prefer to return int
. 因此,许多旧的C函数更喜欢返回
int
。
C requires int
be at least as many bits as char
. C要求
int
至少与char
一样多。 Therefore, int
can store the same values as char
(allowing for signed/unsigned differences). 因此,
int
可以存储与char
相同的值(允许有符号/无符号差异)。 In most cases, int
is a lot larger than char
. 在大多数情况下,
int
比char
大很多。
char
is an integer type that is intended to store a character code from the implementation-defined character set, which is required to be compatible with C's abstract basic character set. char
是一种整数类型,用于存储来自实现定义字符集的字符代码,该字符代码需要与C的抽象基本字符集兼容。 (ASCII qualifies, so do the source-charset and execution-charset allowed by your compiler, including the one you are actually using.) (ASCII符合条件,编译器允许的source-charset和execution-charset也是如此,包括你实际使用的那个。)
For the sizes and ranges of the integer types ( char
included), see your <limits.h>
. 有关整数类型的大小和范围(包括
char
),请参阅<limits.h>
。 Here is somebody else's limits.h . 这里是别人的limits.h中 。
getchar()
attempts to read a byte from the standard input stream. getchar()
尝试从标准输入流中读取一个字节。 The return value can be any possible value of the type unsigned char
(from 0
to UCHAR_MAX
), or the special value EOF
which is specified to be negative. 返回值可以是
unsigned char
类型的任何可能值(从0
到UCHAR_MAX
),或者指定为负数的特殊值EOF
。
On most current systems, UCHAR_MAX
is 255
as bytes have 8 bits, and EOF
is defined as -1
, but the C Standard does not guarantee this: some systems have larger unsigned char
types (9 bits, 16 bits...) and it is possible, although I have never seen it, that EOF
be defined as another negative value. 在大多数当前系统中,
UCHAR_MAX
为255
因为字节有8位, EOF
定义为-1
,但C标准不保证这一点:某些系统有更大的unsigned char
类型(9位,16位......),它有可能,虽然我从未见过它, EOF
被定义为另一个负值。
Storing the return value of getchar()
(or getc(fp)
) to a char
would prevent proper detection of end of file. 将
getchar()
(或getc(fp)
)的返回值存储到char
将阻止正确检测文件结尾。 Consider these cases (on common systems): 考虑这些情况(在常见系统上):
if char
is an 8-bit signed type, a byte value of 255
, which is the character ÿ
in the ISO8859-1 character set, has the value -1
when converted to a char
. 如果
char
是8位有符号类型,则字节值255
(ISO8859-1字符集中的字符ÿ
在转换为char
时具有值-1
。 Comparing this char
to EOF
will yield a false positive. 将此
char
与EOF
进行比较将产生误报。
if char
is unsigned, converting EOF
to char
will produce the value 255
, which is different from EOF
, preventing the detection of end of file. 如果
char
是无符号的,则将EOF
转换为char
将产生值255
,这与EOF
不同,从而阻止检测到文件结尾。
These are the reasons for storing the return value of getchar()
into an int
variable. 这些是将
getchar()
的返回值存储到int
变量中的原因。 This value can later be converted to a char
, once the test for end of file has failed. 一旦文件结束测试失败,此值稍后可以转换为
char
。
Storing an int
to a char
has implementation defined behavior if the char
type is signed and the value of the int
is outside the range of the char
type. 如果
char
类型已签名且int
的值超出char
类型的范围,则将int
存储到char
具有实现定义的行为。 This is a technical problem, which should have mandated the char
type to be unsigned, but the C Standard allowed for many existing implementations where the char
type was signed. 这是一个技术问题,应该强制
char
类型是无符号的,但C标准允许许多现有的char
类型被签名的实现。 It would take a vicious implementation to have unexpected behavior for this simple conversion. 这种简单的转换会产生意想不到的行为。
The value of the char
does indeed depend on the execution character set. char
的值确实取决于执行字符集。 Most current systems use ASCII or some extension of ASCII such as ISO8859-x, UTF-8, etc. But the C Standard supports other character sets such as EBCDIC, where the lowercase letters do not form a contiguous range. 大多数当前系统使用ASCII或某些ASCII扩展,如ISO8859-x,UTF-8等。但C标准支持其他字符集,如EBCDIC,其中小写字母不形成连续范围。
C was designed as a very low-level language, so it is close to the hardware. C被设计为一种非常低级的语言,因此它非常接近硬件。 Usually, after a bit of experience, you can predict how the compiler will allocate memory, and even pretty accurately what the machine code will look like.
通常,经过一些经验,您可以预测编译器将如何分配内存,甚至可以准确地预测机器代码的外观。
Your intuition is right: it goes back to ASCII. 你的直觉是正确的:它可以追溯到ASCII。 ASCII is really a simple 1:1 mapping from letters (which make sense in human language) to integer values (that can be worked with by hardware);
ASCII实际上是一个简单的1:1映射,从字母(在人类语言中有意义)到整数值(可由硬件处理); for every letter there is an unique integer.
对于每个字母,都有一个唯一的整数。 For example, the 'letter' CTRL-A is represented by the decimal number '1'.
例如,'字母'CTRL-A由十进制数'1'表示。 (For historical reasons, lots of control characters came first - so CTRL-G, which rand the bell on an old teletype terminal, is ASCII code 7. Upper-case 'A' and the 25 remaining UC letters start at 65, and so on. See http://www.asciitable.com/ for a full list.)
(由于历史原因,许多控制字符首先出现 - 因此CTRL-G在旧的电传终端上敲响了铃声,是ASCII码7.大写'A'和剩下的25个UC字母从65开始,所以请参阅http://www.asciitable.com/获取完整列表。)
C lets you 'coerce' variables into other types. C允许您将变量“强制”为其他类型。 In other words, the compiler cares about (1) the size, in memory, of the var (see 'pointer arithmetic' in K&R), and (2) what operations you can do on it.
换句话说,编译器关心(1)var的内存大小(参见K&R中的'指针算术'),以及(2)你可以对它做什么操作。
If memory serves me right, you can't do arithmetic on a char. 如果内存对我有用,你就不能对char进行算术运算。 But, if you call it an int, you can.
但是,如果你把它称为int,你可以。 So, to convert all LC letters to UC, you can do something like:
因此,要将所有LC字母转换为UC,您可以执行以下操作:
char letter;
....
if(letter-is-upper-case) {
letter = (int) letter - 32;
}
Some (or most) C compilers would complain if you did not reinterpret the var as an int before adding/subtracting. 如果在添加/减去之前没有将var重新解释为int,那么一些(或大多数)C编译器会抱怨。
but, in the end, the type 'char' is just another term for int, really, since ASCII assigns a unique integer for each letter. 但是,最后,类型'char'只是int的另一个术语,实际上,因为ASCII为每个字母分配一个唯一的整数。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.