简体   繁体   中英

carriage return by fgets

I am running the following code:

#include<stdio.h>
#include<string.h>
#include<io.h>

int main(){
    FILE *fp;
    if((fp=fopen("test.txt","r"))==NULL){
        printf("File can't be read\n");
        exit(1);
    }
    char str[50];
    fgets(str,50,fp);
    printf("%s",str);
    return 0;
}

text.txt contains: I am a boy\\r\\n

Since I am on Windows, it takes \\r\\n as a new line character and so if I read this from a file it should store "I am a boy\\n\\0" in str , but I get "I am a boy\\r\\n" . I am using mingw compiler.

The behavior depends on the c library implementation and which mode you pass to fopen . See this quote from the MSDN documentation on fopen (fopen on MSDN) :

b - Open in binary (untranslated) mode; translations involving carriage-return and linefeed characters are suppressed.

Means, if you use the Microsoft c library, and open your file omitting the 'b', the carriage return characters will be removed from the stream.

Since you're using mingw, your compiler probably links against the GNU c library which follows the POSIX standard. This is what the GNU documentation says about fopen (fopen on gnu.org) :

The character 'b' in opentype has a standard meaning; it requests a binary stream rather than a text stream. But this makes no difference in POSIX systems (including GNU systems).

Concluding: you're omitting the 'b' mode char, which opens your stream in text mode. You're on Windows but use a GNU c library which makes no difference between text and binary mode. This is why fgets reads both carriage return and new line.

Since I am on Windows, it takes \\r\\n as a new line character...

This assumption is wrong. The C standard treats carriage return and new line as two different things, as evidenced in C99 §5.2.1/3 (Character sets):

[...] In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new-line. [...]

The fgets function description is as follows, in C99 §7.19.7.2/2:

The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.

Therefore, when encountering the string I am a boy\\r\\n , a conforming implementation should read up to the \\n character. There is no possibly sane reason why the implementation should discard \\r based on the platform.

The c standard says this about text streams in (among other things):

Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character.

In other words, if a file is opened in text mode, an implementation is free to add, remove and modify control characters if it wants/needs to when going to and from disk. Which is apparently what the microsoft implementation does with the carriage return, but the gnu implementation doesn't.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM