简体   繁体   English

fgetc返回一个未知字符

[英]fgetc returns an unknown character

I have the following code: 我有以下代码:

FILE *f = fopen('/path/to/some/file', 'rb');
char c;
while((c = fgetc(f)) != EOF)
{
    printf("next char: '%c', '%d'", c, c);
}

For some reason, when printing out the characters, at the end of the file, an un-renderable character gets printed out, along with the ASCII ordinal -1. 由于某些原因,在打印字符时,在文件末尾,将打印出不可渲染的字符以及ASCII序数-1。

next char: '?', '-1'

What character is this supposed to be? 这应该是什么角色? I know it's not EOF because there's a check for that, and quickly after the character is printed, the program SEGFAULT. 我知道这不是EOF,因为要进行检查,并且在打印字符后很快就会执行SEGFAULT程序。

The trouble is that fgetc() and its relatives return an int , not a char : 麻烦的是fgetc()及其亲戚返回一个int ,而不是char

If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined). 如果未设置由流指向的输入流的文件结尾指示符,并且存在下一个字符,则fgetc函数将获取该字符作为unsigned char并将其转换为int并将该流的关联文件位置指示符前进(如果已定义)。

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of- file indicator for the stream is set and the fgetc function returns EOF . 如果设置了流的文件结束指示符,或者流在文件末尾,则设置了流的文件结束指示符,并且fgetc函数返回EOF

It has to return every possible valid character value and a distinct value, EOF (which is negative, and usually but not necessarily -1 ). 它必须返回每个可能的有效字符值和一个不同的值EOF (为负值,通常但不一定是-1 )。

When you read the value into a char instead of an int , one of two undesirable things happens: 当您将值读入char而不是int ,发生以下两种不良情况之一:

  • If plain char is unsigned, then you never get a value equal to EOF, so the loop never terminates. 如果纯char是无符号的,则您永远不会获得等于EOF的值,因此循环永远不会终止。

  • If plain char is signed, then you can mistake a legitimate character, 0xFF (often ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) is treated the same as EOF, so you detect EOF prematurely. 如果对普通char进行了签名,则可能会误认为一个合法字符,即0xFF(通常是,y-umlaut,U + 00FF,带有DIAERESIS的拉丁文小写字母Y)与EOF相同,因此您会过早检测到EOF。

Either way, it is not good. 无论哪种方式,它都不是很好。

The Fix 修复

The fix is to use int c; 解决方法是使用int c; instead of char c; 代替char c; .


Incidentally, the fopen() call should not compile: 顺便说一句, fopen()调用不应编译:

FILE *f = fopen('/path/to/some/file', 'rb');

should be: 应该:

FILE *f = fopen("/path/to/some/file", "rb");

Always check the result of fopen() ; 始终检查fopen()的结果; of all the I/O functions, it is more prone to failure than almost any other (not through its own fault, but because the user or programmer makes a mistake with the file name). 在所有I / O功能中,它几乎比其他任何功能都更容易出错(不是因为它本身的错误,而是因为用户或程序员对文件名犯了错误)。

This is the culprit: 这是罪魁祸首:

char c;

Please change it to: 请更改为:

int c;

The return type of fgetc is int , not char . fgetc的返回类型是int ,而不是char You get strange behavior when you convert int to char in some platforms. 在某些平台上将int转换为char时,会得到奇怪的行为。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM