[英]Why does fgetc() return int instead of char?
I would like to copy binary file source to file target.我想将二进制文件源复制到文件目标。 Nothing more!而已! The code is inspired from many examples found on the Internet.该代码的灵感来自 Internet 上的许多示例。
#include <stdio.h>
int main(int argc, char **argv) {
FILE *fp1, *fp2;
char ch;
fp1 = fopen("source.pdf", "r");
fp2 = fopen("target.pdf", "w");
while((ch = fgetc(fp1)) != EOF)
fputc(ch, fp2);
fclose(fp1);
fclose(fp2);
return 0;
}
The result differs in file size.结果因文件大小而异。
root@vm:/home/coder/test# ls -l
-rwxr-x--- 1 root root 14593 Feb 28 10:24 source.pdf
-rw-r--r-- 1 root root 159 Mar 1 20:19 target.pdf
Ok, so what's the problem?好的,有什么问题吗?
I know that char is unsigned and get signed when above 80. See here .我知道 char 是未签名的,并且在 80 以上时被签名。请参见此处。
This is confirmed when I use printf("%x\\n", ch);
这在我使用printf("%x\\n", ch);
时得到证实printf("%x\\n", ch);
which returns approximately 50% of the time something like sometimes FFFFFFE1
.它返回大约 50% 的时间,例如有时FFFFFFE1
。
The solution to the my issue would be to use int
iso char
.我的问题的解决方案是使用int
iso char
。
Examples found with char
: example 1 , example 2 example 3 , example 4 , ...使用char
找到的示例:示例 1 ,示例 2示例 3 , 示例 4 ,...
Examples found with int
: example a , ...使用int
找到的示例: 示例 a , ...
I don't use fancy compiler options.我不使用花哨的编译器选项。
Why are virtually all code examples found returning fgetc() to an char
iso an int
, which would be more correct?为什么发现几乎所有代码示例都将 fgetc() 返回到char
iso 和int
,哪个更正确?
What am I missing?我错过了什么?
ISO C mandates that fgetc()
returns an int
since it must be able to return every possible character in addition to an end-of-file indicator. ISO C 要求fgetc()
返回一个int
因为除了文件结束指示符之外,它还必须能够返回所有可能的字符。
So code that places the return value into a char
, and uses it to detect EOF, is generally plain wrong and should not be used.因此,将返回值放入char
并使用它来检测 EOF 的代码通常是完全错误的,不应使用。
Having said that, two of the examples you gave don't actually do that.话虽如此,您给出的两个示例实际上并没有这样做。
One of them uses fseek
and ftell
to get the number of bytes in the file and then uses that to control the read/write loop.其中之一使用fseek
和ftell
来获取文件中的字节数,然后使用它来控制读/写循环。 That's could be problematic since the file can actually change in size after the size is retrieved but that's a different problem to trying to force an int
into a char
.这可能是有问题的,因为在检索大小后文件实际上可以改变大小,但这与试图将int
强制转换为char
是不同的问题。
The other uses feof
immediately after the character is read to check if the end of file has been reached.另一个在读取字符后立即使用feof
来检查是否已到达文件末尾。
But you're correct in that the easiest way to do it is to simply use the return value correctly, something like:但是您是正确的,最简单的方法是正确使用返回值,例如:
int charInt;
while ((charInt = fgetc(inputHandle)) != EOF)
doSomethingWith(charInt);
Well the thing is most of code you saw then is wrong.那么问题是你看到的大部分代码都是错误的。 There are 3 types of char
- signed
, unsigned
and plain char.有 3 种类型的char
- signed
、 unsigned
和 plain char。 Now if plain char is by default signed then a character with decimal value 255
will be considered equal to -1
(EOF).现在,如果默认情况下普通字符是有符号的,那么十进制值为255
的字符将被视为等于-1
(EOF)。 This is not what you want.这不是你想要的。 (Yes decimal value 255
won't be representable in signed char but it's implementation defined behavior and on most ones it will store the bit pattern 0xFF
in the char
). (是的,十进制值255
不能用有符号字符表示,但它是实现定义的行为,在大多数情况下,它会将位模式0xFF
存储在char
)。
Secondly if char
is unsigned
then it EOF
will be considered as 0xFF
that is also wrong now and comparison would fail.其次,如果char
是unsigned
那么它的EOF
将被视为0xFF
现在也是错误的并且比较将失败。 (Knowing that EOF
is -1
it will be converted to CHAR_MAX
which is 255
or 0xFF
). (知道EOF
是-1
它将被转换为CHAR_MAX
,即255
或0xFF
)。
That's why int
is considered so that it can hold the value of EOF
correctly and that is how you should use it.这就是为什么考虑int
以便它可以正确保存EOF
的值,这就是您应该如何使用它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.