简体   繁体   English

fgets 回车

[英]carriage return by fgets

I am running the following code:我正在运行以下代码:

#include<stdio.h>
#include<string.h>
#include<io.h>

int main(){
    FILE *fp;
    if((fp=fopen("test.txt","r"))==NULL){
        printf("File can't be read\n");
        exit(1);
    }
    char str[50];
    fgets(str,50,fp);
    printf("%s",str);
    return 0;
}

text.txt contains: I am a boy\\r\\n text.txt 包含: I am a boy\\r\\n

Since I am on Windows, it takes \\r\\n as a new line character and so if I read this from a file it should store "I am a boy\\n\\0" in str , but I get "I am a boy\\r\\n" .由于我在 Windows 上,它需要 \\r\\n 作为换行符,所以如果我从文件中读取它应该在str存储"I am a boy\\n\\0" ,但我得到"I am a boy\\r\\n" I am using mingw compiler.我正在使用 mingw 编译器。

The behavior depends on the c library implementation and which mode you pass to fopen .该行为取决于 c 库实现以及您传递给fopen模式。 See this quote from the MSDN documentation on fopen (fopen on MSDN) :请参阅有关fopen的 MSDN 文档中的引用(MSDN 上的fopen

b - Open in binary (untranslated) mode; b - 以二进制(未翻译)模式打开; translations involving carriage-return and linefeed characters are suppressed.禁止涉及回车和换行符的翻译。

Means, if you use the Microsoft c library, and open your file omitting the 'b', the carriage return characters will be removed from the stream.意思是,如果您使用 Microsoft c 库,并打开省略 'b' 的文件,则回车字符将从流中删除。

Since you're using mingw, your compiler probably links against the GNU c library which follows the POSIX standard.由于您使用的是 mingw,您的编译器可能会链接到遵循 POSIX 标准的 GNU c 库。 This is what the GNU documentation says about fopen (fopen on gnu.org) :这是 GNU 文档关于fopen (gnu.org 上的 fopen)的说明

The character 'b' in opentype has a standard meaning; opentype 中的字符 'b' 具有标准含义; it requests a binary stream rather than a text stream.它请求二进制流而不是文本流。 But this makes no difference in POSIX systems (including GNU systems).但这在 POSIX 系统(包括 GNU 系统)中没有区别。

Concluding: you're omitting the 'b' mode char, which opens your stream in text mode.结论:您省略了 'b' 模式字符,它以文本模式打开您的流。 You're on Windows but use a GNU c library which makes no difference between text and binary mode.您在 Windows 上,但使用 GNU c 库,它在文本和二进制模式之间没有区别。 This is why fgets reads both carriage return and new line.这就是fgets读取回车和换行的原因。

Since I am on Windows, it takes \\r\\n as a new line character...由于我在 Windows 上,它需要 \\r\\n 作为换行符......

This assumption is wrong.这个假设是错误的。 The C standard treats carriage return and new line as two different things, as evidenced in C99 §5.2.1/3 (Character sets): C 标准将回车和换行视为两种不同的事物,如 C99 §5.2.1/3(字符集)所示:

[...] In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new-line. [...] 在基本执行字符集中,应该有代表警告、退格、回车和换行的控制字符。 [...] [...]

The fgets function description is as follows, in C99 §7.19.7.2/2: fgets函数描述如下,在 C99 §7.19.7.2/2 中:

The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. fgets 函数从 stream 指向的流中读取至多比 n 指定的字符数少 1 到 s 指向的数组中。 No additional characters are read after a new-line character (which is retained) or after end-of-file.在换行符(保留)后或文件结束不会读取其他字符 A null character is written immediately after the last character read into the array.在读入数组的最后一个字符后立即写入空字符。

Therefore, when encountering the string I am a boy\\r\\n , a conforming implementation should read up to the \\n character.因此,当遇到字符串I am a boy\\r\\n ,符合要求的实现应该读到\\n字符。 There is no possibly sane reason why the implementation should discard \\r based on the platform.实现应该基于平台丢弃\\r没有可能合理的理由。

The c standard says this about text streams in (among other things): c 标准对以下内容中的文本流进行了说明(除其他外):

Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment.可能必须在输入和输出中添加、更改或删除字符,以符合在宿主环境中表示文本的不同约定。 Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation.因此,流中的字符与外部表示中的字符之间不需要一一对应。 Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line;仅在以下情况下,从文本流中读取的数据将必然与之前写入该流的数据相等: 数据仅由打印字符和控制字符水平制表符和换行符组成; no new-line character is immediately preceded by space characters;没有新行字符紧跟在空格字符之前; and the last character is a new-line character.最后一个字符是换行符。

In other words, if a file is opened in text mode, an implementation is free to add, remove and modify control characters if it wants/needs to when going to and from disk.换句话说,如果文件以文本模式打开,则实现可以自由地添加、删除和修改控制字符,如果它在往返磁盘时需要/需要。 Which is apparently what the microsoft implementation does with the carriage return, but the gnu implementation doesn't.这显然是微软实现对回车所做的,但 gnu 实现没有。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM