用pyparsing匹配純文本

Question

我不知道如何按原樣解析純文本（也有空格），並且仍然能夠匹配文本中的特殊結構。 假設你有一個像

some plain text
specialStructure
plain text again

我想要實現的是一個能夠為我提供解析器的解析器

['some plain text\n', 'specialStructure', '\nplain text again']

我的第一次嘗試是

import pyparsing as pp

def join_words(toks):
    return ' '.join(toks)

struct = pp.Regex(r'specialStructure')
word = ~struct + pp.Word(pp.alphas)
txt = pp.OneOrMore(word).addParseAction(join_words)
grammar = pp.ZeroOrMore(struct | txt)

result = grammar.parseString(s)

即使這給了我這種情況的期望，這里的問題是，如果純文本有換行符或制表符或其他類型的空格，最后我只會得到用空格鍵分隔的單詞...

如何查找純文本，直到找到特殊的結構或輸入的結尾？

更新

我發現的部分解決方案是使用SkipTo類：

import pyparsing as pp

struct = pp.Regex(r'specialStructure')
txt = pp.SkipTo( struct ) | pp.SkipTo( pp.StringEnd(), include=True )
grammar = pp.ZeroOrMore( struct | txt )

result = grammar.parseString(s)

這里的問題是嵌套結構。 假設您要解析一個更復雜的字符串，例如：

s = """
some plain text
nestedStructureBegin
   here we are inside a nested structure
   nestedStructureBegin
      bla bla
   nestedStructureEnd
nestedStructureEnd
some bla bla again.
"""

import pyparsing as pp

grammar = pp.Forward()
begin = pp.Regex(r'nestedStructureBegin').suppress()
end = pp.Regex(r'nestedStructureEnd').suppress()
struct = begin + pp.Group(grammar) + end
keyword = begin | end
txt = pp.SkipTo( keyword ) | pp.SkipTo( pp.StringEnd(), include=True )
grammar << pp.ZeroOrMore( struct | txt )

for parser in [struct, txt]:
    parser.addParseAction(lambda toks: print(toks))

result = grammar.parseString(s)

我認為問題來自嵌套結構中不匹配的pp.StringEnd的使用，但是我不確定這是怎么回事...有什么建議嗎？

Answer 1

我已經找到了即使嵌套結構也能很好地工作的解決方案。 這個想法是按字符分析輸入的char，然后使用pp.Combine重構原始的純文本輸入。

s = """
some plain text
begin
   we are inside a nested structure
   begin
      some more depth
   end
end
and finally some more bla bla...
"""

import pyparsing as pp

grammar = pp.Forward()
begin = pp.Regex(r'begin').suppress()
end = pp.Regex(r'end').suppress()
keyword = begin | end
block = begin + pp.Group(grammar) + end
char = ~keyword + pp.Regex(r'[\s\S]')
chars = pp.OneOrMore(char)
txt = pp.Combine(chars)
grammar << pp.ZeroOrMore( block | txt )

result = grammar.parseString(s)

用pyparsing匹配純文本

問題描述

1 個解決方案

解決方案1
0 2017-08-11 09:49:41

用pyparsing匹配純文本

問題描述

1 個解決方案

解決方案1 0 2017-08-11 09:49:41

解決方案1
0 2017-08-11 09:49:41