简体   繁体   English

为什么选择二元运算符而不是一元运算符?

[英]Why is a binary operator selected over a unary operator?

For operators that act as both unary and binary, why is the binary one selected in an expression like a@b ?对于既是一元又是二元的运算符,为什么在a@b这样的表达式中选择二元运算符?

After a lot of thought and searching, I still haven't been able to answer why something like a+b is parsed as a binary expression instead of a(+b) , which would obviously be gibberish.经过大量的思考和搜索,我仍然无法回答为什么像a+b这样的东西被解析为二进制表达式而不是a(+b) ,这显然是胡言乱语。

I don't think a context free grammar would be able to distinguish the two, and trying to find an answer in this version of the standard has not given me any answers.我不认为上下文无关语法能够区分这两者,并且试图在这个版本的标准中找到答案并没有给我任何答案。

Does the parser choose the binary version specifically because the unary version would be gibberish?解析器是否专门选择二进制版本,因为一元版本会乱码? If so, is there a section in the standard that outlines this?如果是这样,标准中是否有一个部分对此进行了概述?

Context-free doesn't mean "no state".上下文无关并不意味着“没有状态”。 A parser has a lot of state to keep track of what grammatical rules are possible given the tokens it's seen so far and to predict what tokens will come next.解析器有很多 state 来跟踪给定到目前为止看到的标记可能出现的语法规则,并预测接下来会出现哪些标记。 Because there's no rule that says that two expressions can appear directly adjacent to each other it'll never even consider that a+b could be the expressions a and +b side by side.因为没有规则说两个表达式可以直接相邻出现,所以它甚至永远不会认为a+b可以是表达式a+b并排。

For example, let's say that we're using this rudimentary grammar:例如,假设我们正在使用这个基本语法:

expr → expr '+' unary_expr | unary_expr
unary_expr → '+' unary_expr | IDENT

(Notation: gives the rules a non-terminal can expand to and | indicates alternate possibilities. '+' is the plus token and IDENT is any identifier token.) (符号: 给出了非终结符可以扩展的规则, |表示替代的可能性。 '+'是加号标记, IDENT是任何标识符标记。)

Let's parse a+b .让我们解析a+b Our parser's starting state will be:我们的解析器的起始 state 将是:

1. expr → expr '+' unary_expr
         ^
2. expr → unary_expr
         ^
3. unary_expr → '+' unary_expr
               ^
4. unary_expr → IDENT
               ^

There are the rules it's considering at the start.有它在一开始就考虑的规则。 It doesn't know which it's going to get, could be any of them.它不知道会得到哪个,可能是其中任何一个。 Notice that each production it's considering also includes a cursor , which I've marked with ^ carets above.请注意,它正在考虑的每个产品还包括一个cursor ,我在上面用^符号标记了它。 That's where in the rule the parser is.这就是解析器在规则中的位置。

Okay, so now it sees the first IDENT token.好的,现在它看到了第一个IDENT令牌。 It updates its state to the following:它将其 state 更新为以下内容:

1. expr → expr '+' unary_expr
              ^
2. expr → unary_expr
                    ^
3. unary_expr → IDENT
                     ^

Now there are three rules that it's considering.现在它正在考虑三个规则。 Notice that the cursor has moved along to the right in each of them.请注意,cursor 在它们中的每一个中都向右移动。

If the first rule is correct then it has just seen an expression and is expecting a '+' next.如果第一条规则是正确的,那么它刚刚看到了一个表达式,并期待下一个'+' Alternatively, maybe the second rule is the correct one and a is just a unary expression.或者,也许第二条规则是正确的,而a只是一元表达式。 In that case it expects no more tokens to follow.在这种情况下,它预计不会有更多的令牌跟随。 The parser doesn't know which it's going to be so it's considering both.解析器不知道它会是哪个,所以它正在考虑两者。

You'll see that if the next token is a '+' then it must be a binary plus.您会看到,如果下一个标记是'+'那么它必须是二进制加号。 Why?为什么? Because the first rule is the only rule that anticipates a '+' token next.因为第一条规则是唯一预期下一个'+'标记的规则。

For '+' to be interpreted as a unary plus the parser would have to have this rule in its active state, with the cursor before the '+' :要将'+'解释为一元加,解析器必须在其活动的 state 中包含此规则,在'+'之前使用 cursor:

unary_expr → '+' unary_expr 
            ^

And you can see that it doesn't.你可以看到它没有。


If context-free doesn't mean stateless, what does it mean, then?如果上下文无关并不意味着无状态,那么它是什么意思呢? What "context" are we "free" from?我们从什么“背景”中“自由”?

Context-free is a restriction on what rules the grammar can contain.上下文无关是对语法可以包含哪些规则的限制。 The opposite is context-sensitive , where where productions can vary based on their surroundings.相反的是context-sensitive ,其中产品可以根据周围环境而变化。 Context-sensitive grammars are more powerful than context-free grammars but they are much more difficult to parse—even for humans.上下文相关语法比上下文无关语法更强大,但它们更难解析——即使对人类来说也是如此。 Language theoreticians figured out early on that context-free grammars occupy a sweet spot of being powerful enough to be expressive without being overwhelmingly complex to reason about.语言理论家很早就发现,上下文无关语法占据了一个甜蜜点,即足够强大,可以表达而不会过于复杂而无法推理。

For more details see: Context-free grammars versus context-sensitive grammars?有关更多详细信息,请参阅: 上下文无关语法与上下文相关语法?

A context-free grammar (CFG) is a grammar where (as you noted) each production has the form A → w, where A is a nonterminal and w is a string of terminals and nonterminals.上下文无关文法(CFG) 是一种文法,其中(如您所述)每个产生式的形式为 A → w,其中 A 是非终结符,w 是终结符和非终结符的字符串。 Informally, a CFG is a grammar where any nonterminal can be expanded out to any of its productions at any point.非正式地,CFG 是一种语法,其中任何非终结符都可以在任何时候扩展到其任何产生式。 The language of a grammar is the set of strings of terminals that can be derived from the start symbol.文法的语言是可以从开始符号派生的终结符串的集合。

A context- sensitive grammar (CSG) is a grammar where each production has the form wAx → wyx, where w and x are strings of terminals and nonterminals and y is also a string of terminals.上下文相关文法(CSG) 是一种文法,其中每个产生式的形式为 wAx → wyx,其中 w 和 x 是终结符和非终结符字符串,y 也是终结符字符串。 In other words, the productions give rules saying "if you see A in a given context , you may replace A by the string y."换句话说,产生式给出的规则是“如果你在给定的上下文中看到 A,你可以用字符串 y 替换 A”。 It's an unfortunate that these grammars are called "context-sensitive grammars" because it means that "context-free" and "context-sensitive" are not opposites, and it means that there are certain classes of grammars that arguably take a lot of contextual information into account but aren't formally considered to be context-sensitive.不幸的是,这些语法被称为“上下文敏感语法”,因为这意味着“上下文无关”和“上下文敏感”不是对立的,这意味着某些语法类别可以说需要大量上下文信息被考虑在内,但不被正式认为是上下文相关的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM