简体   繁体   English

对云雀语法中标记优先级的混淆

[英]Confusion around priority of tokens in lark grammar

Following up from an earlier question , I'm a bit confused about the precedence of the /.+/ regex line;较早的问题之后,我对/.+/正则表达式行的优先级感到有些困惑; I would expect the below test to produce我希望下面的测试能够产生

  line
  line  x
  chunk abc

instead I get:相反,我得到:

  line
  line  x
  line  abc
    def test_tokenizing(self):
        p = Lark(r"""

        _NL: /\n/
        line.-1: /.+/? _NL
        chunk: /abc/ _NL
        start: (line|chunk)+

        """, parser='lalr')

        text = '\nx\nabc\n'
        print(p.parse(text).pretty())

In Lark, priorities mean different things for rules and for terminals.在 Lark 中,优先级对于规则和终端来说意味着不同的东西。

Just a quick reminder, rules have lowercase names, while terminals have UPPERCASE names.快速提醒一下,规则的名称是小写的,而终端的名称是大写的。

In LALR mode, priorities on rules only affect which one is chosen in case of a reduce/reduce collision.在 LALR 模式下,规则的优先级仅影响在减少/减少冲突的情况下选择哪一个。 It has no effect on the terminals inside it.它对里面的端子没有影响。

What you want is to change the priority on the terminal itself:您想要的是更改终端本身的优先级:

def test_tokenizing():
    p = Lark(r"""

    _NL: /\n/
    line: EVERYTHING? _NL
    EVERYTHING.-1: /.+/
    chunk: /abc/ _NL
    start: (line|chunk)+

    """, parser='lalr')

    text = '\nx\nabc\n'
    print(p.parse(text).pretty())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM