简体   繁体   English

为什么 fgetc() 返回 int 而不是 char?

[英]Why does fgetc() return int instead of char?

I would like to copy binary file source to file target.我想将二进制文件源复制到文件目标。 Nothing more!而已! The code is inspired from many examples found on the Internet.该代码的灵感来自 Internet 上的许多示例。

#include <stdio.h>

int main(int argc, char **argv) {

    FILE *fp1, *fp2;
    char ch;

    fp1 = fopen("source.pdf", "r");
    fp2 = fopen("target.pdf", "w");

    while((ch = fgetc(fp1)) != EOF)
        fputc(ch, fp2);

    fclose(fp1);
    fclose(fp2);

    return 0;

}

The result differs in file size.结果因文件大小而异。

root@vm:/home/coder/test# ls -l
-rwxr-x--- 1 root root 14593 Feb 28 10:24 source.pdf
-rw-r--r-- 1 root root   159 Mar  1 20:19 target.pdf

Ok, so what's the problem?好的,有什么问题吗?

I know that char is unsigned and get signed when above 80. See here .我知道 char 是未签名的,并且在 80 以上时被签名。请参见此处

This is confirmed when I use printf("%x\\n", ch);这在我使用printf("%x\\n", ch);时得到证实printf("%x\\n", ch); which returns approximately 50% of the time something like sometimes FFFFFFE1 .它返回大约 50% 的时间,例如有时FFFFFFE1

The solution to the my issue would be to use int iso char .我的问题的解决方案是使用int iso char

Examples found with char : example 1 , example 2 example 3 , example 4 , ...使用char找到的示例示例 1示例 2示例 3示例 4 ,...

Examples found with int : example a , ...使用int找到的示例示例 a , ...

I don't use fancy compiler options.我不使用花哨的编译器选项。

Why are virtually all code examples found returning fgetc() to an char iso an int , which would be more correct?为什么发现几乎所有代码示例都将 fgetc() 返回到char iso 和int ,哪个更正确?

What am I missing?我错过了什么?

ISO C mandates that fgetc() returns an int since it must be able to return every possible character in addition to an end-of-file indicator. ISO C 要求fgetc()返回一个int因为除了文件结束指示符之外,它还必须能够返回所有可能的字符。

So code that places the return value into a char , and uses it to detect EOF, is generally plain wrong and should not be used.因此,将返回值放入char使用它来检测 EOF 的代码通常是完全错误的,不应使用。


Having said that, two of the examples you gave don't actually do that.话虽如此,您给出的两个示例实际上并没有这样做。

One of them uses fseek and ftell to get the number of bytes in the file and then uses that to control the read/write loop.其中之一使用fseekftell来获取文件中的字节数,然后使用来控制读/写循环。 That's could be problematic since the file can actually change in size after the size is retrieved but that's a different problem to trying to force an int into a char .这可能是有问题的,因为在检索大小文件实际上可以改变大小但这与试图将int强制转换为char是不同的问题。

The other uses feof immediately after the character is read to check if the end of file has been reached.另一个在读取字符后立即使用feof来检查是否已到达文件末尾。


But you're correct in that the easiest way to do it is to simply use the return value correctly, something like:但是您是正确的,最简单的方法是正确使用返回值,例如:

int charInt;
while ((charInt = fgetc(inputHandle)) != EOF)
    doSomethingWith(charInt);

Well the thing is most of code you saw then is wrong.那么问题是你看到的大部分代码都是错误的。 There are 3 types of char - signed , unsigned and plain char.有 3 种类型的char - signedunsigned和 plain char。 Now if plain char is by default signed then a character with decimal value 255 will be considered equal to -1 (EOF).现在,如果默认情况下普通字符是有符号的,那么十进制值为255的字符将被视为等于-1 (EOF)。 This is not what you want.这不是你想要的。 (Yes decimal value 255 won't be representable in signed char but it's implementation defined behavior and on most ones it will store the bit pattern 0xFF in the char ). (是的,十进制值255不能用有符号字符表示,但它是实现定义的行为,在大多数情况下,它会将位模式0xFF存储在char )。

Secondly if char is unsigned then it EOF will be considered as 0xFF that is also wrong now and comparison would fail.其次,如果charunsigned那么它的EOF将被视为0xFF现在也是错误的并且比较将失败。 (Knowing that EOF is -1 it will be converted to CHAR_MAX which is 255 or 0xFF ). (知道EOF-1它将被转换为CHAR_MAX ,即2550xFF )。

That's why int is considered so that it can hold the value of EOF correctly and that is how you should use it.这就是为什么考虑int以便它可以正确保存EOF的值,这就是您应该如何使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM