简体   繁体   中英

Find next word after the matching keyword in a list of strings using regex in python

I have a list of strings and I want to extract the next word after a specific keyword in each string.

When I am using the lambda function to iterate over the list, I am getting the whole strings instead of just the next word after the keyword:

import re
s = ["The job ABS is scheduled by Bob.", "The job BFG is scheduled by Alice."]
user = filter(lambda i:re.search('(?<=The job )(\w+)',i),s)
print(*user)

output: The job ABS is scheduled by Bob. The job BFG is scheduled by Alice.

but, when I am trying the same code for a single string, it is giving the correct output:

import re
s = "The job ABS is scheduled by Bob."
user = re.search('(?<=The job )(\w+)',s)
print(user.group())

output: ABS

How can I get output like (ABS, BFG) from the first code snippet?

You can use

import re
s = ["The job ABS is scheduled by Bob.", "The job BFG is scheduled by Alice."]
rx = re.compile(r'(?<=The job )\w+')
user = tuple(map(lambda x: x.group() or "", map(rx.search, s)))
print(user)

See the Python demo .

Alternatively, if there can be any amount of whitespace, use

rx = re.compile(r'The\s+job\s+(\w+)')
user = tuple(map(lambda x: x.group(1) or "", map(rx.search, s)))

Output:

('ABS', 'BFG')

Here, the map(rx.search, s) returns an iterator to the match data objects or None s, and the outer map(lambda x: x.group(...) or "", ...) gets the value of the group (either the whole match with .group() or Group 1 value with .group(1) ), or returns an empty string if there was no match.

You can simplify this:

import re

arr = ["The job ABS is scheduled by Bob.", "The job BFG is scheduled by Alice."]
user = [re.findall('(?<=The job )\w+', s) for s in arr]
print (user)
print (tuple(user))

Output:

[['ABS'], ['BFG']]
(['ABS'], ['BFG'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM