簡體   English   中英

在匹配 python 正則表達式之后查找下一個/上一個字符串

[英]Find next/previous string after match python regex

我需要查找文本中提到的人的姓名,我需要使用關鍵字列表過濾所有姓名,例如:

key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"...]

For example, in the text:

INPUT: "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO 
and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "

OUTPUT:
(magistrate, DANIEL SMITH)
(officer, MARCO ANTONIO)
(defendant, WILL SMITH)
(plaintfill, MARIA FREEMAN)

所以我有兩個問題:首先,在鍵之前提到名稱,其次如何構建正則表達式以同時使用所有關鍵字和過濾器。

我嘗試過一些事情:

line = re.split("magistrate",text)[1]
name = []
for key in line.split():
    if key.isupper(): name.append(key)
    else:
        break
" ".join(name)
OUTPUT: 'DANIEL SMITH'

謝謝!

是否必須使用正則表達式? 如果不是,這就是我的答案,因為我們仍然可以在沒有正則表達式的情況下做到這一點

您可以使用split()方法使用空格分隔符拆分行。 此方法返回一個列表,將其分配給一個變量並遍歷該列表。 嘗試這個

key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"]

line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN"
line_words = line.split(" ")

for word in line_words:
    if word in key_words:
        Index = line_words.index(word)
        print(word, line_words[Index+1], line_words[Index+2])

我建議將re.findall與兩個捕獲組一起使用,方法如下:

import re
key_words = ["magistrate","officer","attorney","applicant","defendant","plaintiff"]
line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "
found = re.findall('('+'|'.join(key_words)+')'+r'\s+([ A-Z]+[A-Z])',line)
print(found)

Output:

[('magistrate', 'DANIEL SMITH'), ('officer', 'MARCO ANTONIO'), ('plaintiff', 'MARIA FREEMAN')]

說明:在re.findall的模式中使用多個捕獲組(由()表示)導致tuple列表(在這種情況下為 2 元組)。 第一個組是通過使用|加入簡單地創建的。 它在模式中像 OR 一樣工作,然后我們有一個或多個空格( \s+ ),它在任何組之外,因此不會出現在結果中,最后我們有第二組,它由一個或多個空格或 ASCII 大寫字母組成( [ AZ]+ ) 后跟單個 ASCII 大寫字母 ( [AZ] ),因此它不會捕獲尾隨空格。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM