[英]Find next/previous string after match python regex
我需要查找文本中提到的人的姓名,我需要使用關鍵字列表過濾所有姓名,例如:
key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"...]
For example, in the text:
INPUT: "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO
and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "
OUTPUT:
(magistrate, DANIEL SMITH)
(officer, MARCO ANTONIO)
(defendant, WILL SMITH)
(plaintfill, MARIA FREEMAN)
所以我有兩個問題:首先,在鍵之前提到名稱,其次如何構建正則表達式以同時使用所有關鍵字和過濾器。
我嘗試過一些事情:
line = re.split("magistrate",text)[1]
name = []
for key in line.split():
if key.isupper(): name.append(key)
else:
break
" ".join(name)
OUTPUT: 'DANIEL SMITH'
謝謝!
是否必須使用正則表達式? 如果不是,這就是我的答案,因為我們仍然可以在沒有正則表達式的情況下做到這一點
您可以使用split()
方法使用空格分隔符拆分行。 此方法返回一個列表,將其分配給一個變量並遍歷該列表。 嘗試這個
key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"]
line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN"
line_words = line.split(" ")
for word in line_words:
if word in key_words:
Index = line_words.index(word)
print(word, line_words[Index+1], line_words[Index+2])
我建議將re.findall
與兩個捕獲組一起使用,方法如下:
import re
key_words = ["magistrate","officer","attorney","applicant","defendant","plaintiff"]
line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "
found = re.findall('('+'|'.join(key_words)+')'+r'\s+([ A-Z]+[A-Z])',line)
print(found)
Output:
[('magistrate', 'DANIEL SMITH'), ('officer', 'MARCO ANTONIO'), ('plaintiff', 'MARIA FREEMAN')]
說明:在re.findall
的模式中使用多個捕獲組(由(
和)
表示)導致tuple
列表(在這種情況下為 2 元組)。 第一個組是通過使用|
加入簡單地創建的。 它在模式中像 OR 一樣工作,然后我們有一個或多個空格( \s+
),它在任何組之外,因此不會出現在結果中,最后我們有第二組,它由一個或多個空格或 ASCII 大寫字母組成( [ AZ]+
) 后跟單個 ASCII 大寫字母 ( [AZ]
),因此它不會捕獲尾隨空格。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.