简体   繁体   English

C编译器如何解析以下C语句?

[英]How does the C compiler parse the following C statement?

Consider the following lines: 请考虑以下几行:

int i;
printf("%d",i);

Will the lexical analyzer go into the string to parse % and d as separate tokens, or will it parse "%d" as one token? 词法分析器会进入字符串来解析%d作为单独的标记,还是会将“%d”解析为一个标记?

There are two parsers at work here: first, the C compiler, that will parse the C file and basically ignore the content of the string (though modern C compilers will parse the string as well to help catch bad format strings — mismatches between the % conversion specifier and the corresponding argument passed to printf() to be converted). 这里有两个解析器:首先是C编译器,它将解析C文件并基本上忽略字符串的内容(尽管现代C编译器也将解析字符串以帮助捕获错误的格式字符串 - %之间的不匹配转换说明符和传递给要转换的printf()的相应参数。

The next parser is the string format parser built into the C runtime library. 下一个解析器是内置于C运行时库中的字符串格式解析器。 This will be called at runtime to parse the format string when you call printf . 这将在运行时调用,以在调用printf时解析格式字符串。 This parser is of course very simple in comparison. 这个解析器当然非常简单。

I have not checked, but I would guess that the C compilers that help checking for bad format strings will implement a printf -like parser as a post-processing step (ie using its own lexer). 我没有检查,但我猜想有助于检查错误格式字符串的C编译器将实现类似printf的解析器作为后处理步骤(即使用自己的词法分析器)。

A string literal is a single token. 字符串文字是单个标记。 The above code will be tokenized like this: 上面的代码将被标记为这样:

int     keyword "int"
i       identifier
;       semicolon
printf  identifier
(       open paren
"%d"    string literal
,       comma
i       identifier
)       closing paren
;       semicolon

"%d" is a string literal and it will be seen as one token by both the C preprocessor and also by the compiler, we can see this by going to draft C99 standard section 6.4 Lexical elements which defines the following tokens: "%d"是一个字符串文字 ,它将被C预处理器和编译器看作一个标记,我们可以通过草拟C99标准部分6.4定义以下标记的词汇元素来看到这一点:

token:
  keyword
  identifier
  constant
  string-literal
  punctuator

and the following proprocessing tokens: 和以下预处理令牌:

preprocessing-token:
  header-name
  identifier
  pp-number
  character-constant
  string-literal
  punctuator
  each non-white-space character that cannot be one of the above

and says: 并说:

A token is the minimal lexical element of the language in translation phases 7 and 8. The categories of tokens are: keywords, identifiers, constants, string literals , and punctuators. 令牌是翻译阶段7和8中语言的最小词汇元素令牌的类别是:关键字,标识符,常量, 字符串文字和标点符号。 A preprocessing token is the minimal lexical element of the language in translation phases 3 through 6. The categories of preprocessing tokens are : header names, identifiers, preprocessing numbers, character constants, string literals , punctuators, and single non-white-space characters that do not lexically match the other preprocessing token categories.58) [...] 预处理标记是翻译阶段3到6中语言的最小词汇元素。 预处理标记的类别是 :标题名称,标识符,预处理数字,字符常量, 字符串文字 ,标点符号和单个非空白字符不要与其他预处理令牌类别词汇匹配.58)[...]

The different phases of translation are covered in section 5.1.1.2 Translation phases and I will highlight some of the relevant ones here: 翻译的不同阶段将在5.1.1.2 翻译阶段中介绍 ,我将在此重点介绍一些相关的翻译阶段

[...] [...]

3 The source file is decomposed into preprocessing tokens 6) and sequences of white-space characters (including comments). 3源文件被分解为预处理标记 6)和空白字符序列(包括注释)。

[...] [...]

6 Adjacent string literal tokens are concatenated. 6连接相邻的字符串文字标记。

7 White-space characters separating tokens are no longer significant. 7分隔标记的空白字符不再重要。 Each preprocessing token is converted into a token . 每个预处理令牌都转换为令牌 The resulting tokens are syntactically and semantically analyzed and translated as a translation unit. 由此产生的标记在语法和语义上进行分析并翻译为翻译单元。

[...] [...]

The distinction between pre-processor tokens and tokens may seem irrelevant but we can see that in at least one case such as in adjacent string literals for example "%d" "\\n" you would have two pre-processor tokens while after phase 6 there would be only one token . 预处理器令牌令牌之间的区别似乎无关紧要,但我们可以看到,在至少一种情况下,例如在相邻的字符串文字中,例如"%d" "\\n" ,在第6阶段之后,您将拥有两个预处理器令牌只有一个令牌

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM