简体   繁体   中英

Why the carriage return character is not considered as a white space character by the preprocessor

In the section 6.4 Lexical elements of the C Standard there is written

  1. ... Preprocessing tokens can be separated by white space; this consists of comments (described later), or white-space characters (space, horizontal tab, new-line, vertical tab, and form-feed), or both.

As it is seen the carriage return character is not included in the notion of the white space characters.

On the other hand in the description of the standard C function isspace there is written ( 7.4.1.10 The isspace function )

  1. ...The standard white-space characters are the following: space (''), form feed ('\\f'), new-line ('\\n'), carriage return ('\\r') , horizontal tab ('\\t'), and vertical tab ('\\v'). In the "C" locale, isspace returns true only for the standard white-space characters.

Is it intentionally that the carriage return character is not mentioned in the section describing preprocessing and if so what is the reason?

Or is it just a Standard's defect?

The same questions are valid for the C++ Standard.

See N1570 5.2.1 paragraph 3.

The carriage return character is a member of the basic execution character set (and it treated by isspace() as a white-space character), but it's not part of the basic source character set.

The source and execution basic character sets both include "the space character, and control characters representing horizontal tab, vertical tab, and form feed". In addition, "In the basic execution character set, there shall be control characters representing alert, backspace, carriage return, and new line".

On some systems, the carriage return character is part of the indication of an end-of-line; any such indication is treated as a single new-line. A carriage return character that's not part of an end-of-line indicator in a source file causes undefined behavior.

The source file input gets translated into the source character set (translation phase 1 in §5.1.1.2 of the standard). The source character set is described in §5.2.1.

In C.2011, §5.2.1¶3:

In source files, there shall be some way of indicating the end of each line of text; this International Standard treats such an end-of-line indicator as if it were a single new-line character.

A bare carriage return is not part of the source character set. If it appears as part of a line termination sequence, it gets translated into a single new-line before the C preprocessor begins to do its work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM