简体   繁体   中英

Confusion around priority of tokens in lark grammar

Following up from an earlier question , I'm a bit confused about the precedence of the /.+/ regex line; I would expect the below test to produce

  line
  line  x
  chunk abc

instead I get:

  line
  line  x
  line  abc
    def test_tokenizing(self):
        p = Lark(r"""

        _NL: /\n/
        line.-1: /.+/? _NL
        chunk: /abc/ _NL
        start: (line|chunk)+

        """, parser='lalr')

        text = '\nx\nabc\n'
        print(p.parse(text).pretty())

In Lark, priorities mean different things for rules and for terminals.

Just a quick reminder, rules have lowercase names, while terminals have UPPERCASE names.

In LALR mode, priorities on rules only affect which one is chosen in case of a reduce/reduce collision. It has no effect on the terminals inside it.

What you want is to change the priority on the terminal itself:

def test_tokenizing():
    p = Lark(r"""

    _NL: /\n/
    line: EVERYTHING? _NL
    EVERYTHING.-1: /.+/
    chunk: /abc/ _NL
    start: (line|chunk)+

    """, parser='lalr')

    text = '\nx\nabc\n'
    print(p.parse(text).pretty())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM