[英]Find all sentences containing specific words
我有一个由句子组成的字符串,并且想要查找包含至少一个特定关键字的所有句子,即keyword1
1 或keyword2
:
import re
s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "
pattern = re.compile(r"([A-Z][^\.!?].*(keyword1)|(keyword2).*[\.!?])\s")
for match in pattern.findall(s):
print(match)
输出:
('This is a sentence which contains keyword1', 'keyword1', '')
('keyword2 is inside this sentence. ', '', 'keyword2')
预期输出:
('This is a sentence which contains keyword1', 'keyword1', '')
('And keyword2 is inside this sentence. ', '', 'keyword2')
如您所见,第二个匹配项不包含第一组中的整个句子。 我在这里想念什么?
您可以使用否定字符类来不匹配.
!
和?
并将关键字放在同一组中以防止结果中出现空字符串。
然后 re.findall 返回捕获组值,即整个匹配的第 1 组,以及其中一个关键字的第 2、3 组等。
([A-Z][^.!?]*(?:(keyword1)|(keyword2))[^.!?]*[.!?])\s
解释
(
捕获组 1
[AZ][^.!?]*
匹配大写字符 AZ 和可选的任何字符,除了.!?
(?:(keyword1)|(keyword2))
捕获自己组中的关键字之一[^.!?]*[.!?]
匹配除.!?
之外的任何字符然后匹配.!?
之一)
关闭第 1 组\s
匹配一个空白字符例子
import re
s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "
pattern = re.compile(r"([A-Z][^.!?]*(?:(keyword1)|(keyword2))[^.!?]*[.!?])\s")
for match in pattern.findall(s):
print(match)
输出
('This is a sentence which contains keyword1.', 'keyword1', '')
('And keyword2 is inside this sentence.', '', 'keyword2')
您可以尝试以下正则表达式:
[.?!]*\s*(.*(keyword1)[^.?!]*[.?!]|.*(keyword2)[^.?!]*[.?!])
代码:
import re
s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "
pattern = re.compile(r"[.?!]*\s*(.*(keyword1)[^.?!]*[.?!]|.*(keyword2)[^.?!]*[.?!])")
for match in pattern.findall(s):
print(match)
输出:
('This is a sentence which contains keyword1.', 'keyword1', '')
('And keyword2 is inside this sentence.', '', 'keyword2')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.