简体   繁体   English

关于EOF和ÿ的困惑

[英]Confusion about EOF and ÿ

In my GCC on Windows, the value of EOF is -1 .在我的 Windows 上的 GCC 中, EOF的值为-1 And I notice that the value of 'ÿ' is also -1 .而且我注意到'ÿ'的值也是-1 So I did the following experiment and I'm totally confused of the results.所以我做了以下实验,我对结果完全感到困惑。

int main() {
    
    int a = 'ÿ';
    if (a == EOF) {
        putchar('a');
        putchar(a);
    }

    char b = 'ÿ';
    if (b == EOF) {
        putchar('b');
        putchar(b);
    }

    putchar('\n');

    int c;
    if ((c = getchar()) != EOF) {
        putchar('c');
        putchar(c);
    }

    char d;
    if ((d = getchar()) != EOF) {
        putchar('d');
        putchar(d);
    }
}

The results are结果是

aÿbÿ  // a == EOF b == EOF
ÿÿ    //My input for int c and char d
cÿ    // c != EOF

My questions are: 1. When I directly assign 'ÿ' to a variable, no matter the type is int or char, it equals to EOF .我的问题是: 1.当我直接将'ÿ'分配给一个变量时,无论类型是 int 还是 char,它都等于EOF But when I assign 'ÿ' to int c from stdin, it turns out that c doesn't equal to EOF .但是,当我从标准输入将'ÿ'分配给int c时,事实证明c不等于EOF What happened here?这里发生了什么? 2. How does the system distinguish between 'ÿ' and EOF if there's a 'ÿ' in the file? 2.如果文件中有'ÿ' ÿ”,系统如何区分'ÿ'EOF

'ÿ' is the character representation of number 255. Its value as char literal is -1 . 'ÿ'是数字 255 的字符表示。它作为字符文字的值是-1

Both 255 and -1 have the same 8-bit representation ( 11111111 ), it depends if it is interpreted as a signed or unsigned value. 255-1都具有相同的 8 位表示( 11111111 ),这取决于它是被解释为有符号值还是无符号值。 char is signed, therefore its value as char is -1 . char是有符号的,因此它的值为char-1

When it is assigned to a char variable it is stored as is.当它被分配给一个char变量时,它按原样存储。
When it is assigned to an int variable, the value is promoted to int and this does not change its value, it is only represented using more bits (4 bytes).当它被分配给一个int变量时,该值被提升为int并且这不会改变它的值,它只是使用更多的位(4 个字节)来表示。

Incidentally -1 is also the value of EOF (but you should always use the constant EOF in the code and never rely on its numeric value).顺便说一句, -1也是EOF的值(但您应该始终在代码中使用常量EOF并且永远不要依赖它的数值)。


getchar() returns an int ; getchar()返回一个int for 'ÿ' it returns 255.对于'ÿ' ,它返回 255。

When it is assigned to an int the value is preserved.当它被分配给一个int时,该值被保留。

When it is assigned to a char , the behaviour is undefined (because the possible range of values for a char variable is -128 .. +127 ).当它被分配给char时,行为是未定义的(因为char变量的可能值范围是-128 .. +127 )。
It seems that your compiler chooses to store the rightmost 8 bits of 255 into the char variable and, due to the fact that char is signed, the value is interpreted as -1 .似乎您的编译器选择将255的最右边 8 位存储到char变量中,并且由于char已签名,该值被解释为-1

How does the system distinguish between 'ÿ' and EOF if there's a 'ÿ' in the file?如果文件中有“ÿ”,系统如何区分“ÿ”和 EOF?

getchar() , fgetc() / getc() and other functions that read characters return int . getchar()fgetc() / getc()和其他读取字符的函数返回int This means they always return values between (and including) 0 and 255 when the succeed and EOF (that has a negative values) when there end of file is reached.这意味着它们总是在成功时返回(包括) 0255之间的值,而在到达文件末尾时返回EOF (具有负值)。

The value of EOF is negative, it cannot be confused with 'ÿ' . EOF的值为负数,不能与'ÿ'混淆。

A C program has an execution character set, and this determines how character literals are mapped to integer values. C 程序有一个执行字符集,这决定了字符文字如何映射到 integer 值。

It seems like your program is being compiled with iso-8859-1 as the execution character set.似乎您的程序正在使用 iso-8859-1 作为执行字符集进行编译。 On my computer, the default for gcc is utf-8, where 'ÿ' maps to the "multi-character constant" 50111. With iso-8859-1, gcc maps it to -1.在我的计算机上,gcc 的默认值为 utf-8,其中'ÿ'映射到“多字符常量”50111。使用 iso-8859-1,ZE0D511356BD44123AF49CC91.C9DCF 映射到 - I have to use the flag -fexec-charset=iso-8859-1 to reproduce what you're seeing.我必须使用标志-fexec-charset=iso-8859-1来重现您所看到的内容。

When you read from a file (or from stdin), you get whatever bytes the operating system gives you (interpreted as an unsigned character).当您从文件(或标准输入)读取时,您会得到操作系统给您的任何字节(解释为无符号字符)。 The encoding of stdin and files is in general independent from the execution character set.标准输入和文件的编码通常独立于执行字符集。

What you're observing is that the execution character set is iso-8859-1 mapped to the range -128 to 127 (rather than the usual 0 to 255), presumably with the rationale that char is signed on your compiler, so can represent every value in the execution character set.您观察到的是执行字符集是 iso-8859-1 映射到范围 -128 到 127(而不是通常的 0 到 255),大概是因为char在您的编译器上签名,所以可以表示执行字符集中的每个值。 The encoding for stdin seems also to be iso-8859-1, except it uses the usual 0 to 255. In case (d) in your question, the value 255 is being assigned to a char (which is probably signed, from -128 to 127), and gcc is wrapping it. stdin 的编码似乎也是 iso-8859-1,除了它使用通常的 0 到 255。在您的问题中的 (d) 情况下,值 255 被分配给一个char (可能是从 -128 开始的签名到 127),并且 gcc 正在包装它。

Summary:概括:

  • (a) assigns -1 to a (a) 将 -1 分配给a
  • (b) assigns -1 to b (b) 将 -1 分配给b
  • (c) assigns 255 to c (c) 将 255 分配给c
  • (d) converts 255 to a char , resulting in -1. (d) 将 255 转换为char ,结果为 -1。 This -1 is assigned to d .这个 -1 被分配给d

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM