简体   繁体   English

在 C++ 中用“$”编写原始字符串的正确格式是什么?

[英]What is the proper format of writing raw strings with '$' in C++?

I'm learning about raw strings in C++ from a cplusplus.com tutorial on constants .我正在从关于常量的 cplusplus.com 教程中了解 C++ 中的原始字符串。 Based on the definition on that site, a raw string should start with R"sequence( and end with )sequence where sequence can be any sequence of characters.根据该站点上的定义,原始字符串应以R"sequence(并以)sequence开头,其中sequence可以是任何字符序列。

One of the examples of the website is the following:该网站的示例之一如下:

R"&%$(string with \backslash)&%$"

However, when I try to compile the code that contains the above raw string, I get a compilation error.但是,当我尝试编译包含上述原始字符串的代码时,会出现编译错误。

test.cpp:5:28: error: invalid character '$' in raw string delimiter
    5 |     std::string str = R"&%$(string with \backslash)&%$";
      |                       ^
test.cpp:5:23: error: stray 'R' in program

I tried it with g++ and clang++ on both Windows and Linux.我在 Windows 和 Linux 上使用 g++ 和 clang++ 进行了尝试。 None of them worked.他们都没有工作。

From C++ reference :来自C++ 参考

delimiter : A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long) delimiter :由任何源字符组成的字符序列,但括号、反斜杠和空格(可以为空,最多 16 个字符)

Note the "any source character" part here.请注意此处的“任何源字符”部分。

Let us look at what the standard says:让我们看看标准是怎么说的:

From [gram.lex] :来自[gram.lex]

raw-string :原始字符串
" d-char-sequence opt ( r-char-sequenceopt ) d-char-sequence opt " " d-char-sequence opt ( r-char-sequenceopt ) d-char-sequence opt "

... ...

d-char-sequence : d字符序列
d-char d字符
d-char-sequence d-char d-char 序列 d-char

d-char : d字符
any member of the basic source character set except: space, the left parenthesis ( , the right parenthesis ) , the backslash \ , and the control characters representing horizontal tab, vertical tab, form feed, and newline.基本源字符集的任何成员,除了:空格、左括号( 、右括号) 、反斜杠\以及表示水平制表符、垂直制表符、换页符和换行符的控制字符。

Well, what is the basic source character set?那么,什么是基本的源字符集? From [lex.charset] :[lex.charset]

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:基本源字符集由 96 个字符组成:空格字符、表示水平制表符、垂直制表符、换页符和换行符的控制字符,以及以下 91 个图形字符:

ab c defghijklmnopq r stuvwxyz AB C DEFGHIJKLMNOPQ R STUVWXYZ 0 1 2 3 4 5 6 7 8 9_ { } [ ] # ( ) < > %: ; ab c defghijklmnopq r stuvwxyz AB C DEFGHIJKLMNOPQ R STUVWXYZ 0 1 2 3 4 5 6 7 8 9_ { } [ ] # ( ) < > %: ; . . ? ? * + - / ^ & |~, = , \ " ' * + - / ^ & |~, = , \ " '

... which does not include $ ; ...不包括$ ; so the conclusion is that the dollar sign $ cannot be part of the delimiter sequence.所以结论是美元符号$不能成为分隔符序列的一部分。

For the basic source character set , see lex.charset 5.3 (1) : that set does not contain the $ character.对于基本源字符集,请参见lex.charset 5.3 (1) :该集不包含$字符。 For the allowed prefix characters in raw string literals, see lex.string 5.13.5 : "/…/ any member of the basic source character set except: space, the left parenthesis ( , the right parenthesis ) , the backslash \ , and the control characters representing horizontal tab, vertical tab, form feed, and newline. " (emphasis mine).对于原始字符串文字中允许的前缀字符,请参阅lex.string 5.13.5 : "/.../基本源字符集的任何成员,除了:空格、左括号( 、右括号) 、反斜杠\和代表水平制表符、垂直制表符、换页符和换行符的控制字符。 ”(强调我的)。

Just remove $ like the code below:只需像下面的代码一样删除$

string string3 = R"&%(string with \backslash)&%";

$ gives error because the basic source character set does not have $ as said in the comments. $给出错误,因为基本源字符集没有$如评论中所述。

  1. The individual bytes of the source code file are mapped (in implementation-defined manner) to the characters of the basic source character set.源代码文件的各个字节被映射(以实现定义的方式)到基本源字符集的字符。 In particular, OS-dependent end-of-line indicators are replaced by newline characters.特别是,依赖于操作系统的行尾指示符被换行符替换。 The basic source character set consists of 96 characters:基本源字符集由 96 个字符组成:

a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line) a) 5 个空白字符(空格、水平制表符、垂直制表符、换页、换行符)

b) 10 digit characters from '0' to '9' b) 10 位数字字符,从 '0' 到 '9'

c) 52 letters from 'a' to 'z' and from 'A' to 'Z' c) 从 'a' 到 'z' 和从 'A' 到 'Z' 的 52 个字母

d) 29 punctuation characters: _ { } [ ] # ( ) < > %: ; d) 29 个标点字符:_ { } [ ] # ( ) < > %: ; . . ? ? * + - / ^ & | * + - / ^ & | ~, =. 〜,=。 \ " ' 2) Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name (escaped with \u or \U) or by some implementation-defined form that is handled equivalently. \ " ' 2) 任何无法映射到基本源字符集中字符的源文件字符都将替换为其通用字符名称(用 \u 或 \U 转义)或等效处理的某些实现定义的形式。

Ref: Click here参考: 点击这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM