[英]Regular expression to extract all sentences that start and end with the same word
Given a string of sentences, I need to extract a list of all of the sentences which start and end with the same word.给定一串句子,我需要提取以相同单词开头和结尾的所有句子的列表。
eg例如
# sample text
text = "This is a sample sentence. well, I'll check that things are going well. another sentence starting with another. ..."
# required result
[
"well, I'll check that things are going well",
"another sentence starting with another"
]
How can I make the match using back references and also capture the full sentence?如何使用反向引用进行匹配并捕获完整句子?
I have tried the following regex but it's not working.我尝试了以下正则表达式,但它不起作用。
re.findall("^[a-zA-Z](.*[a-zA-Z])?$", text)
text = "This is a sample sentence. going to checking whether it is well going. another
sentence starting with another."
sentences = re.split('[.!?]+', text)
result = []
for s in sentences:
words = s.split()
if len(words) > 0 and words[0] == words[-1]:
result.append(s.strip())
print(result)
You could try using a backreference to reuse the match...您可以尝试使用反向引用来重用匹配...
import re
# sample text
text = "This is a sample sentence. Well, I'll check that things are going well. another sentence starting with another. ..."
print([match[0] for match in re.findall(r"((\b\w+\b)[^.?!]+\2[.?!])", text, re.IGNORECASE)])
This prints...这印...
['Well, I'll check that things are going well.', 'another sentence starting with another.']
Note: I changed the case of the first "well" to "Well" for testing purposes.注意:为了测试目的,我将第一个“well”的大小写更改为“Well”。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.