Find next/previous string after match python regex

Question

I need to find the name of persons that are mentioned in a text, I need filter all the names with a list of key_words, for example:

key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"...]

For example, in the text:

INPUT: "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO 
and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "

OUTPUT:
(magistrate, DANIEL SMITH)
(officer, MARCO ANTONIO)
(defendant, WILL SMITH)
(plaintfill, MARIA FREEMAN)

So I have two problems: First when the name is mentioned before the key and second how to build a regex for use all the keywords and filter at the same time.

There is something I have tried:

line = re.split("magistrate",text)[1]
name = []
for key in line.split():
    if key.isupper(): name.append(key)
    else:
        break
" ".join(name)
OUTPUT: 'DANIEL SMITH'

Thanks you!

Answer 1

Is it compulsory to use regex? If not this is my answer, because we can still do this without regex

You can just split the line with a whitespace separator using the split() method. This method return a list, assign that to a variable and iterate through that list. Try this

key_words = ["magistrate","officer","attorney","applicant","defendant","plaintfill"]

line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN"
line_words = line.split(" ")

for word in line_words:
    if word in key_words:
        Index = line_words.index(word)
        print(word, line_words[Index+1], line_words[Index+2])

Answer 2

I suggest using re.findall with two capture groups, following way:

import re
key_words = ["magistrate","officer","attorney","applicant","defendant","plaintiff"]
line = "The magistrate DANIEL SMITH blalblablal, who was in a meeting with the officer MARCO ANTONIO and WILL SMITH, defendant of the judgment filed by the plaintiff MARIA FREEMAN "
found = re.findall('('+'|'.join(key_words)+')'+r'\s+([ A-Z]+[A-Z])',line)
print(found)

Output:

[('magistrate', 'DANIEL SMITH'), ('officer', 'MARCO ANTONIO'), ('plaintiff', 'MARIA FREEMAN')]

Explanation: using multiple capturing groups in pattern for re.findall (denoted by ( and ) ) result in list of tuple s (2-tuples in this case). First group is simply created by joining using | which work like OR in pattern, then we have one or more whitespaces ( \s+ ) which is outside any group and thus will not appear in result, finally we have second group which consist of one or more space or ASCII uppercase later ( [ AZ]+ ) followed by single ASCII uppercase letter ( [AZ] ), so it would not catch trailing space.

Find next/previous string after match python regex

Question

2 answers

solution1
0 2020-08-13 13:33:25

solution2
0 2020-08-13 13:40:48

Find next/previous string after match python regex

Question

2 answers

solution1 0 2020-08-13 13:33:25

solution2 0 2020-08-13 13:40:48

solution1
0 2020-08-13 13:33:25

solution2
0 2020-08-13 13:40:48