繁体   English   中英

在匹配 python 正则表达式之后查找下一个/上一个字符串

[英]Find next/previous string after match python regex

我需要查找文本中提到的人的姓名,我需要使用关键字列表过滤所有姓名,例如:

key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"...]

For example, in the text:

INPUT: "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO 
and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "

OUTPUT:
(magistrate, DANIEL SMITH)
(officer, MARCO ANTONIO)
(defendant, WILL SMITH)
(plaintfill, MARIA FREEMAN)

所以我有两个问题:首先,在键之前提到名称,其次如何构建正则表达式以同时使用所有关键字和过滤器。

我尝试过一些事情:

line = re.split("magistrate",text)[1]
name = []
for key in line.split():
    if key.isupper(): name.append(key)
    else:
        break
" ".join(name)
OUTPUT: 'DANIEL SMITH'

谢谢!

是否必须使用正则表达式? 如果不是,这就是我的答案,因为我们仍然可以在没有正则表达式的情况下做到这一点

您可以使用split()方法使用空格分隔符拆分行。 此方法返回一个列表,将其分配给一个变量并遍历该列表。 尝试这个

key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"]

line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN"
line_words = line.split(" ")

for word in line_words:
    if word in key_words:
        Index = line_words.index(word)
        print(word, line_words[Index+1], line_words[Index+2])

我建议将re.findall与两个捕获组一起使用,方法如下:

import re
key_words = ["magistrate","officer","attorney","applicant","defendant","plaintiff"]
line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "
found = re.findall('('+'|'.join(key_words)+')'+r'\s+([ A-Z]+[A-Z])',line)
print(found)

Output:

[('magistrate', 'DANIEL SMITH'), ('officer', 'MARCO ANTONIO'), ('plaintiff', 'MARIA FREEMAN')]

说明:在re.findall的模式中使用多个捕获组(由()表示)导致tuple列表(在这种情况下为 2 元组)。 第一个组是通过使用|加入简单地创建的。 它在模式中像 OR 一样工作,然后我们有一个或多个空格( \s+ ),它在任何组之外,因此不会出现在结果中,最后我们有第二组,它由一个或多个空格或 ASCII 大写字母组成( [ AZ]+ ) 后跟单个 ASCII 大写字母 ( [AZ] ),因此它不会捕获尾随空格。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM