簡體   English   中英

在正則表達式中包含換行符和任何其他字符 - Python

[英]Including newline character with any other character in regex - Python

我目前在 Python 中使用 Jupyter Notebook 和 Regex 從 txt 格式的字典文件創建單詞和定義字典。

來自文本文件的示例數據: ABACINATE\nA*bac"i*nate, vt Etym: [LL. abacinatus, pp of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n ABACINATE\nA*bac"i*nate, vt Etym: [LL. abacinatus, pp of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n

我試圖創建的模式包括獲取單詞的所有大寫字母,然后刪除文本直到定義。

所需 output

{'word': 'ABACINATE', 'definition': To blind by a red-hot metal plate held before the eyes.'}
{'word': 'ABACINATION', 'definition': The act of abacinating.'}

我已經嘗試過的模式是

pattern="""
(?P<word>[A-Z*]{3,}) #retrieve capital letter word
(\n.*\n\n\Defn:) #ignore all text up until Defn:
(?P<definition>\w*) #retrieve any worded character after Defn:
(.\ ) #end at the full stop and space
"""
for item in re.finditer(pattern,all_words,re.VERBOSE):
    print(item.groupdict())

我正在努力處理這里的換行符。 我試圖隔離大寫字母,然后立即從換行符開始並忽略任何字符,直到'Defn:'之前的兩個換行符,並檢索以句號結尾的定義。

有沒有辦法以這種方式處理換行符?

您大多擁有它,只是缺少一個非貪婪匹配和定義中字符的擴展集。

import re
all_words = """ABACINATE\nA*bac"i*nate, v.t. Etym: [LL. abacinatus, p.p. of abacinare; ab off +\nbacinus a basin.]\n\nDefn: To blind by a red-hot metal plate held before the eyes. [R.]\n\nABACINATION\nA*bac`i*na"tion, n.\n\nDefn: The act of abacinating. [R.]\n\n"""

pattern="""
(?P<word>[A-Z*]{3,})([\s\S]*?Defn:)(?P<definition>[a-zA-Z -]*)
"""
for item in re.finditer(pattern,all_words,re.VERBOSE):
    print(item.groupdict())

{'word': 'ABACINATE', 'definition': '被眼前的熾熱金屬板致盲'} {'word': 'ABACINATION', 'definition': 'abacinating 的行為'}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM