[英]carriage return by fgets
I am running the following code:我正在运行以下代码:
#include<stdio.h>
#include<string.h>
#include<io.h>
int main(){
FILE *fp;
if((fp=fopen("test.txt","r"))==NULL){
printf("File can't be read\n");
exit(1);
}
char str[50];
fgets(str,50,fp);
printf("%s",str);
return 0;
}
text.txt contains: I am a boy\\r\\n
text.txt 包含:
I am a boy\\r\\n
Since I am on Windows, it takes \\r\\n as a new line character and so if I read this from a file it should store "I am a boy\\n\\0"
in str
, but I get "I am a boy\\r\\n"
.由于我在 Windows 上,它需要 \\r\\n 作为换行符,所以如果我从文件中读取它应该在
str
存储"I am a boy\\n\\0"
,但我得到"I am a boy\\r\\n"
。 I am using mingw compiler.我正在使用 mingw 编译器。
The behavior depends on the c library implementation and which mode you pass to fopen
.该行为取决于 c 库实现以及您传递给
fopen
模式。 See this quote from the MSDN documentation on fopen
(fopen on MSDN) :请参阅有关
fopen
的 MSDN 文档中的引用(MSDN 上的fopen
) :
b - Open in binary (untranslated) mode;
b - 以二进制(未翻译)模式打开; translations involving carriage-return and linefeed characters are suppressed.
禁止涉及回车和换行符的翻译。
Means, if you use the Microsoft c library, and open your file omitting the 'b', the carriage return characters will be removed from the stream.意思是,如果您使用 Microsoft c 库,并打开省略 'b' 的文件,则回车字符将从流中删除。
Since you're using mingw, your compiler probably links against the GNU c library which follows the POSIX standard.由于您使用的是 mingw,您的编译器可能会链接到遵循 POSIX 标准的 GNU c 库。 This is what the GNU documentation says about
fopen
(fopen on gnu.org) :这是 GNU 文档关于
fopen
(gnu.org 上的 fopen)的说明:
The character 'b' in opentype has a standard meaning;
opentype 中的字符 'b' 具有标准含义; it requests a binary stream rather than a text stream.
它请求二进制流而不是文本流。 But this makes no difference in POSIX systems (including GNU systems).
但这在 POSIX 系统(包括 GNU 系统)中没有区别。
Concluding: you're omitting the 'b' mode char, which opens your stream in text mode.结论:您省略了 'b' 模式字符,它以文本模式打开您的流。 You're on Windows but use a GNU c library which makes no difference between text and binary mode.
您在 Windows 上,但使用 GNU c 库,它在文本和二进制模式之间没有区别。 This is why
fgets
reads both carriage return and new line.这就是
fgets
读取回车和换行的原因。
Since I am on Windows, it takes \\r\\n as a new line character...
由于我在 Windows 上,它需要 \\r\\n 作为换行符......
This assumption is wrong.这个假设是错误的。 The C standard treats carriage return and new line as two different things, as evidenced in C99 §5.2.1/3 (Character sets):
C 标准将回车和换行视为两种不同的事物,如 C99 §5.2.1/3(字符集)所示:
[...] In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new-line.
[...] 在基本执行字符集中,应该有代表警告、退格、回车和换行的控制字符。 [...]
[...]
The fgets
function description is as follows, in C99 §7.19.7.2/2: fgets
函数描述如下,在 C99 §7.19.7.2/2 中:
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s.
fgets 函数从 stream 指向的流中读取至多比 n 指定的字符数少 1 到 s 指向的数组中。 No additional characters are read after a new-line character (which is retained) or after end-of-file.
在换行符(保留)后或文件结束后不会读取其他字符。 A null character is written immediately after the last character read into the array.
在读入数组的最后一个字符后立即写入空字符。
Therefore, when encountering the string I am a boy\\r\\n
, a conforming implementation should read up to the \\n
character.因此,当遇到字符串
I am a boy\\r\\n
,符合要求的实现应该读到\\n
字符。 There is no possibly sane reason why the implementation should discard \\r
based on the platform.实现应该基于平台丢弃
\\r
没有可能合理的理由。
The c standard says this about text streams in (among other things): c 标准对以下内容中的文本流进行了说明(除其他外):
Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment.
可能必须在输入和输出中添加、更改或删除字符,以符合在宿主环境中表示文本的不同约定。 Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation.
因此,流中的字符与外部表示中的字符之间不需要一一对应。 Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line;
仅在以下情况下,从文本流中读取的数据将必然与之前写入该流的数据相等: 数据仅由打印字符和控制字符水平制表符和换行符组成; no new-line character is immediately preceded by space characters;
没有新行字符紧跟在空格字符之前; and the last character is a new-line character.
最后一个字符是换行符。
In other words, if a file is opened in text mode, an implementation is free to add, remove and modify control characters if it wants/needs to when going to and from disk.换句话说,如果文件以文本模式打开,则实现可以自由地添加、删除和修改控制字符,如果它在往返磁盘时需要/需要。 Which is apparently what the microsoft implementation does with the carriage return, but the gnu implementation doesn't.
这显然是微软实现对回车所做的,但 gnu 实现没有。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.