用 Lark 语法识别多行部分

Question

I'm trying to write a simple grammar to parse text with multi-line sections.. I'm not able to wrap my head around how to do it.我正在尝试编写一个简单的语法来解析具有多行部分的文本。我无法理解如何去做。 Here's the grammar that I've written so far - would appreciate any help here.这是我到目前为止写的语法 - 在这里将不胜感激。

ps: I realize that lark is overkill for this problem but this is just a very simplified version of what I'm trying to parse. ps：我意识到云雀对于这个问题来说太过分了，但这只是我试图解析的一个非常简化的版本。

from unittest import TestCase
from lark import Lark

text = '''
[section 1]
line 1.1
line 1.2

[section 2]
line 2.1
'''

class TestLexer(TestCase):

    def test_basic(self):
        p = Lark(r"""

            _LB: "["
            _RB: "]"
            _NL: /\n/+
            name: /[^]]+/
            content: /.+/s

            section: _NL* _LB name _RB _NL* content
            doc: section*

        """, parser='lalr', start='doc')


        parsed = p.parse(text)

Answer 1

The problem is that your content regex can be matched anywhere with any length, meaning that the rest of the grammar can't work correctly.问题是您的content正则表达式可以匹配任何长度的任何位置，这意味着语法的 rest 无法正常工作。 Instead you a terminal redistricted to a single line and give it a lower priority then the rest.取而代之的是，您将终端重新划分为单行，并赋予其低于 rest 的优先级。

p = Lark(r"""

    _NL: /\n/+
    name: /[^]]+/
    content: (ANY_LINE _NL)+
    ANY_LINE.-1: /.+/

    section: _NL* "[" name "]" _NL* content
    doc: section*

""", parser='lalr', start='doc')

You may need some extra work now to convert the content rule into exactly what you want, but since you claim that this isn't actually your exact problem I wont bother with that here.您现在可能需要一些额外的工作来将content规则转换为您想要的内容，但是由于您声称这实际上不是您的确切问题，所以我不会在这里打扰。

用 Lark 语法识别多行部分

问题描述

1 个解决方案

解决方案1
0 2022-01-17 07:32:08

用 Lark 语法识别多行部分

问题描述

1 个解决方案

解决方案1 0 2022-01-17 07:32:08

解决方案1
0 2022-01-17 07:32:08