在正則表達式中包含換行符和任何其他字符 - Python

Question

我目前在 Python 中使用 Jupyter Notebook 和 Regex 從 txt 格式的字典文件創建單詞和定義字典。

來自文本文件的示例數據： ABACINATE\nA*bac"i*nate, vt Etym: [LL. abacinatus, pp of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n ABACINATE\nA*bac"i*nate, vt Etym: [LL. abacinatus, pp of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n

我試圖創建的模式包括獲取單詞的所有大寫字母，然后刪除文本直到定義。

所需 output

{'word': 'ABACINATE', 'definition': To blind by a red-hot metal plate held before the eyes.'}
{'word': 'ABACINATION', 'definition': The act of abacinating.'}

我已經嘗試過的模式是

pattern="""
(?P<word>[A-Z*]{3,}) #retrieve capital letter word
(\n.*\n\n\Defn:) #ignore all text up until Defn:
(?P<definition>\w*) #retrieve any worded character after Defn:
(.\ ) #end at the full stop and space
"""
for item in re.finditer(pattern,all_words,re.VERBOSE):
    print(item.groupdict())

我正在努力處理這里的換行符。 我試圖隔離大寫字母，然后立即從換行符開始並忽略任何字符，直到'Defn：'之前的兩個換行符，並檢索以句號結尾的定義。

有沒有辦法以這種方式處理換行符？

Answer 1

您大多擁有它，只是缺少一個非貪婪匹配和定義中字符的擴展集。

import re
all_words = """ABACINATE\nA*bac"i*nate, v.t. Etym: [LL. abacinatus, p.p. of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n"""

pattern="""
(?P<word>[A-Z*]{3,})([\s\S]*?Defn:)(?P<definition>[a-zA-Z -]*)
"""
for item in re.finditer(pattern,all_words,re.VERBOSE):
    print(item.groupdict())

{'word': 'ABACINATE', 'definition': '被眼前的熾熱金屬板致盲'} {'word': 'ABACINATION', 'definition': 'abacinating 的行為'}

在正則表達式中包含換行符和任何其他字符 - Python

問題描述

1 個解決方案

解決方案1
0 2021-04-15 01:01:13

在正則表達式中包含換行符和任何其他字符 - Python

問題描述

1 個解決方案

解決方案1 0 2021-04-15 01:01:13

解決方案1
0 2021-04-15 01:01:13