[英]Can we parse .txt file and copy multiline parsed text to excel cell using python?
[英]Parse multiline text using the Parsimonious Python library
我試圖用python簡約庫解析多行文本。 我已經玩了一段時間,無法弄清楚如何有效地處理換行。 一個例子如下。 下面的行為是有道理的。 我在簡約問題中看到了Erik Rose的 評論 ,但我無法弄清楚如何在沒有錯誤的情況下實現它。 感謝您的任何提示......
singleline_text = '''\
FIRST something cool'''
multiline_text = '''\
FIRST something very
cool
SECOND more awesomeness
'''
grammar = Grammar(
"""
bin = ORDER spaces description
ORDER = 'FIRST' / 'SECOND'
spaces = ~'\s*'
description = ~'[A-z0-9 ]*'
""")
適用於單行輸出, print(grammar.parse(singleline_text))
給出:
<Node called "bin" matching "FIRST something cool">
<Node called "ORDER" matching "FIRST">
<Node matching "FIRST">
<RegexNode called "spaces" matching " ">
<RegexNode called "description" matching "something cool">
但多行提出問題,我無法根據上面的鏈接解決, print(grammar.parse(multiline_text))
給出:
---------------------------------------------------------------------------
IncompleteParseError Traceback (most recent call last)
<ipython-input-123-c346891dc883> in <module>()
----> 1 print(grammar.parse(multiline_text))
/Users/me/anaconda3/lib/python3.6/site-packages/parsimonious/grammar.py in parse(self, text, pos)
121 """
122 self._check_default_rule()
--> 123 return self.default_rule.parse(text, pos=pos)
124
125 def match(self, text, pos=0):
/Users/me/anaconda3/lib/python3.6/site-packages/parsimonious/expressions.py in parse(self, text, pos)
110 node = self.match(text, pos=pos)
111 if node.end < len(text):
--> 112 raise IncompleteParseError(text, node.end, self)
113 return node
114
IncompleteParseError: Rule 'bin' matched in its entirety, but it didn't consume all the text. The non-matching portion of the text begins with '
cool
SECOND' (line 1, column 23).
這是我試過的一件事沒有用:
grammar2 = Grammar(
"""
bin = ORDER spaces description newline
ORDER = 'FIRST' / 'SECOND'
spaces = ~'\s*'
description = ~'[A-z0-9 \n]*'
newline = ~r'#[^\r\n]*'
""")
print(grammar2.parse(multiline_text))
(從211行堆棧跟蹤中截斷):
ERROR:root:An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 4))
---------------------------------------------------------------------------
SyntaxError Traceback (most recent call last)
...
VisitationError: SyntaxError: EOL while scanning string literal (<unknown>, line 1)
Parse tree:
<Node called "spaceless_literal" matching "'[A-z0-9
]*'"> <-- *** We were here. ***
<RegexNode matching "'[A-z0-9
]*'">
看起來你需要在語法中重復bin元素:
grammar = Grammar(
r"""
one = bin +
bin = ORDER spaces description newline
ORDER = 'FIRST' / 'SECOND'
newline = ~"\n*"
spaces = ~"\s*"
description = ~"[A-z0-9 ]*"i
""")
你可以解析這樣的事情:
multiline_text = '''\
FIRST something very cool
SECOND more awesomeness
SECOND even better
'''
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.