简体   繁体   English

在 python 中使用正则表达式在字符串列表中查找匹配关键字后的下一个单词

[英]Find next word after the matching keyword in a list of strings using regex in python

I have a list of strings and I want to extract the next word after a specific keyword in each string.我有一个字符串列表,我想在每个字符串中的特定关键字之后提取下一个单词。

When I am using the lambda function to iterate over the list, I am getting the whole strings instead of just the next word after the keyword:当我使用 lambda function 遍历列表时,我得到的是整个字符串,而不仅仅是关键字后的下一个词:

import re
s = ["The job ABS is scheduled by Bob.", "The job BFG is scheduled by Alice."]
user = filter(lambda i:re.search('(?<=The job )(\w+)',i),s)
print(*user)

output: The job ABS is scheduled by Bob. The job BFG is scheduled by Alice.

but, when I am trying the same code for a single string, it is giving the correct output:但是,当我为单个字符串尝试相同的代码时,它给出了正确的 output:

import re
s = "The job ABS is scheduled by Bob."
user = re.search('(?<=The job )(\w+)',s)
print(user.group())

output: ABS

How can I get output like (ABS, BFG) from the first code snippet?我怎样才能从第一个代码片段中得到 output like (ABS, BFG) ?

You can use您可以使用

import re
s = ["The job ABS is scheduled by Bob.", "The job BFG is scheduled by Alice."]
rx = re.compile(r'(?<=The job )\w+')
user = tuple(map(lambda x: x.group() or "", map(rx.search, s)))
print(user)

See the Python demo .请参阅Python 演示

Alternatively, if there can be any amount of whitespace, use或者,如果可以有任意数量的空格,请使用

rx = re.compile(r'The\s+job\s+(\w+)')
user = tuple(map(lambda x: x.group(1) or "", map(rx.search, s)))

Output: Output:

('ABS', 'BFG')

Here, the map(rx.search, s) returns an iterator to the match data objects or None s, and the outer map(lambda x: x.group(...) or "", ...) gets the value of the group (either the whole match with .group() or Group 1 value with .group(1) ), or returns an empty string if there was no match.这里, map(rx.search, s)返回一个迭代器到匹配数据对象或None s,外层map(lambda x: x.group(...) or "", ...)获取值组的匹配项(与.group()的整个匹配项或与.group(1)的第 1 组值),如果没有匹配项,则返回空字符串。

You can simplify this:你可以简化这个:

import re

arr = ["The job ABS is scheduled by Bob.", "The job BFG is scheduled by Alice."]
user = [re.findall('(?<=The job )\w+', s) for s in arr]
print (user)
print (tuple(user))

Output: Output:

[['ABS'], ['BFG']]
(['ABS'], ['BFG'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM