简体   繁体   中英

What characters are legal to use in string literals?

I am wondering if it is legal in C to literally put ascii characters like TAB , BEL and ESC directly in a string literal.

There is no way to display the characters in plain text here on Stackoverflow so I had to take a screenshot instead.

例

Characters that does not have a graphical representation are display using Caret notation and highlighted in purple in the screenshot. There is also a TAB -character at line 7 that indents the text.

This compiles without any warnings using gcc -std=c99 -pedantic , but is it really fully portable?

This is not something that I would use for any serious programs. I am just curious if it the standards allow it.

The portable characters that can apoear in the program source are exactly these:

  • the 26 uppercase letters of the Latin alphabet

     ABCDEFGHIJKLM NOPQRSTUVWXYZ 
  • the 26 lowercase letters of the Latin alphabet

     abcdefghijklm nopqrstuvwxyz 

    the 10 decimal digits

     0 1 2 3 4 5 6 7 8 9 
  • the following 29 graphic characters

     ! " # % & ' ( ) * + , - . / : ; < = > ? [ \\ ] ^ _ { | } ~ 
  • the space character, and control characters representing horizontal tab, vertical tab, and form feed.

Source: the C standard, any version.

An implementation must accept these characters, and is allowed to accept any additional characters.

If a backslash precedes a literal newline character (not \\n ) immediately, both the backslash and the newline are removed. Lines can be split up like that everywhere except in between trigraphs (if a trigraph is split by a backslash-newline sequence, that sequence is removed, but the trigraph is left unchanged).

A literal tab character is allowed in a string literal (in portable code) and has the same semantics as \\t . C11 (n1570) 6.4.5 p1 states, that "any member of the source character set except the double-quote " , backslash \\ , or new-line character" can be part of a string literal, and the tab character is part of the source character set (ibid. 5.2.1 p3).

The escape character ( \\e , ASCII 0x1b) isn't part of the source character set and even may not exist at all (on a non-ASCII system). Same holds for form feed, though \\f is part of the C standard. These characters cannot be used portably.

An implementation is free to accept any character it pleases (additionally to the minimal requirements of the standard), the mapping from the source character set to the execution character set is implementation-defined (an implementation may map different characters in the source code to equal characters).

A null-terminated string is simply a number of 8-bit values which could be 0-255 or -128-127 depending on their signedness.

When you send your bytes to something like a terminal it is all up to the terminal what to do with the bytes. Some bytes like 'a'-'z' might be standard, but only if you assume 8-bit character encoding. Other bytes like '€' might only be possible to present correctly with the right character set.

Finally we have those terminal control bytes to control the cursor and ring the bell. It is all up to the terminal to handle those bytes, but writing them will still be valid C code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM