繁体   English   中英

查找包含特定单词的所有句子

[英]Find all sentences containing specific words

我有一个由句子组成的字符串,并且想要查找包含至少一个特定关键字的所有句子,即keyword1 1 或keyword2

import re

s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "

pattern = re.compile(r"([A-Z][^\.!?].*(keyword1)|(keyword2).*[\.!?])\s")
for match in pattern.findall(s):
    print(match)

输出:

('This is a sentence which contains keyword1', 'keyword1', '')
('keyword2 is inside this sentence. ', '', 'keyword2')

预期输出:

('This is a sentence which contains keyword1', 'keyword1', '')
('And keyword2 is inside this sentence. ', '', 'keyword2')

如您所见,第二个匹配项不包含第一组中的整个句子。 我在这里想念什么?

您可以使用否定字符类来不匹配. ! ? 并将关键字放在同一组中以防止结果中出现空字符串。

然后 re.findall 返回捕获组值,即整个匹配的第 1 组,以及其中一个关键字的第 2、3 组等。

([A-Z][^.!?]*(?:(keyword1)|(keyword2))[^.!?]*[.!?])\s

解释

  • (捕获组 1
    • [AZ][^.!?]*匹配大写字符 AZ 和可选的任何字符,除了.!?
    • (?:(keyword1)|(keyword2))捕获自己组中的关键字之一
    • [^.!?]*[.!?]匹配除.!?之外的任何字符然后匹配.!?之一
  • )关闭第 1 组
  • \s匹配一个空白字符

请参阅正则表达式演示Python 演示

例子

import re

s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "

pattern = re.compile(r"([A-Z][^.!?]*(?:(keyword1)|(keyword2))[^.!?]*[.!?])\s")
for match in pattern.findall(s):
    print(match)

输出

('This is a sentence which contains keyword1.', 'keyword1', '')
('And keyword2 is inside this sentence.', '', 'keyword2')

您可以尝试以下正则表达式:

[.?!]*\s*(.*(keyword1)[^.?!]*[.?!]|.*(keyword2)[^.?!]*[.?!])

代码:

import re

s = "This is a sentence which contains keyword1. And keyword2 is inside this sentence. "

pattern = re.compile(r"[.?!]*\s*(.*(keyword1)[^.?!]*[.?!]|.*(keyword2)[^.?!]*[.?!])")
for match in pattern.findall(s):
    print(match)

输出:

('This is a sentence which contains keyword1.', 'keyword1', '')
('And keyword2 is inside this sentence.', '', 'keyword2')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM