简体   繁体   English

这是什么? 寻找正确的术语这里发生了什么

[英]What is this? Looking for the correct terminology what is going on here

Looking at the following grammar which has an obvious flaw as far as parser generators are concerned: 看下面的语法,就解析器生成器而言,它有一个明显的缺陷:

"Start Symbol" = <Foo>
"Case Sensitive" = True
"Character Mapping" = 'Unicode'

{A} = {Digit}
{B} = [abcdefABCDEF]
{C} = {A} + {B}

Integer = {A}+
HexNumber = {C}+


<ContextA> ::= '[' HexNumber ']'
<ContextB> ::= '{' Integer '}'                      
<Number> ::= <ContextA> | <ContextB>
<Foo> ::= <Number> <Foo>
       | <>

The reason why this grammar is flawed, is, that the scanner cannot distinguish between the terminals [Integer;HexNumber] . 该语法存在缺陷的原因是,扫描器无法区分终端[Integer;HexNumber] (Is 1234 an integer or a hex number?!). 1234是整数还是十六进制数字?!)。

In the productions written in this example, this becomes irrelevant to bits, but there might be grammars, where the context of the productions would clarify if an integer or a hex number is expected and the scanner would still refuse to collaborate. 在此示例中编写的产品中,这与位无关,但是可能存在语法,其中产品的上下文将阐明是否期望整数或十六进制数,并且扫描程序仍将拒绝协作。

So, the scanner would need to know the parser state in order to be able to make the right decision as for the hex or integer token. 因此,扫描程序将需要知道解析器状态,以便能够针对十六进制或整数令牌做出正确的决定。

Now the question for the terminology. 现在是术语的问题。 What does this make this ... errm... grammar? 这是什么使……错误……语法? Lexer? 词法? then? 然后? A context sensitive lexer? 上下文敏感的词法分析器? Or would one say this is a context sensitive grammar, even though it is clearly a scanner problem? 还是会说这是上下文相关的语法,即使它显然是扫描仪问题? Is there other terminology used to describe such phenomena? 还有其他用于描述此类现象的术语吗?

Context sensitive means something quite different. 上下文敏感意味着完全不同。

If you were to use a more formal notation, you'd see that your original grammar was ambiguous, as Ignacio Vazquez-Abrams said, and your edited grammar could be handled fine by an LR(1) (or even LL(1)) parser generator. 如果您使用更正式的符号,则会发现您的原始语法是模棱两可的,正如Ignacio Vazquez-Abrams所说的那样,并且您可以通过LR(1)(甚至LL(1))很好地处理编辑后的语法。解析器生成器。 Here is an unproblematic bison grammar: 这是一个毫无问题的野牛语法:

%start number
%%
digit : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
hex   : digit
      | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' 
      | 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
decnum: digit | decnum digit
hexnum: hex   | hexnum hex
number: '[' decnum ']'
      | '{' hexnum '}'

It's not usual to use bison to create a scanner, of course, but it is certainly possible. 当然,通常不使用野牛来创建扫描仪,但是肯定可以。

I think the problem you are contemplating is this: if we build a scanner using flex, it would look like this: 我认为您正在考虑的问题是这样的:如果我们使用flex构建扫描仪,它将看起来像这样:

[[:digit:]]+  { yylval.string = strdup(yytext); return DECNUM; }
[[:xdigit:]]+ { yylval.string = strdup(yytext); return HEXNUM; }

Flex cannot return an ambiguous token, so in the case where the (next part of the) input is 1234 , flex needs to return either DECNUM or HEXNUM. Flex不能返回模糊的令牌,因此,在(下一部分)输入为1234 ,Flex需要返回DECNUM或HEXNUM。 The first longest ("maximal munch") rule means that which ever pattern comes first in the flex definition will win in the case of a token which could be parsed either way. 第一条最长的规则(“最大嚼数”)意味着,在令牌中可以采用任何一种方式进行解析的情况下,flex定义中最先出现的模式将获胜。 That implies that the DECNUM pattern needs to come first, because otherwise it would be impossible for it to trigger (and flex will provide a warning in that case). 这意味着需要首先使用DECNUM模式,因为否则它将无法触发(在这种情况下,flex将提供警告)。

But now there is a minor problem for the grammar, because when the grammar is expecting a HEXNUM, it needs to be prepared to find a DECNUM. 但是现在语法有一个小问题,因为当语法期望一个HEXNUM时,就需要准备查找DECNUM。 That's not a problem, provided the grammar is unambiguous . 只要语法是明确的 ,那不是问题。 We only need create a couple of non-terminals: 我们只需要创建几个非终端:

decnum: DECNUM           { $$ = strtol($1, NULL, 10); free($1); }
hexnum: DECNUM | HEXNUM  { $$ = strtol($1, NULL, 16); free($1); }

That will not create an ambiguity or even a shift/reduce conflict which doesn't already exist in the grammar . 这不会产生歧义,甚至不会产生语法中尚不存在的移位/减少冲突。

If you want to try this, you'll need to declare some types in your bison prolog: 如果要尝试此操作,则需要在野牛序言中声明一些类型:

%union {
   char* string;
   long  integer;
}
%token <string> HEXNUM DECNUM
%type <integer> hexnum decnum

这种语法可谓是模棱两可的

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM