简体   繁体   English

正则表达式提取所有以相同单词开头和结尾的句子

[英]Regular expression to extract all sentences that start and end with the same word

Given a string of sentences, I need to extract a list of all of the sentences which start and end with the same word.给定一串句子,我需要提取以相同单词开头和结尾的所有句子的列表。

eg例如

# sample text
text = "This is a sample sentence. well, I'll check that things are going well. another sentence starting with another. ..."

# required result
[
 "well, I'll check that things are going well",
 "another sentence starting with another"
]

How can I make the match using back references and also capture the full sentence?如何使用反向引用进行匹配并捕获完整句子?

I have tried the following regex but it's not working.我尝试了以下正则表达式,但它不起作用。

re.findall("^[a-zA-Z](.*[a-zA-Z])?$", text)
text = "This is a sample sentence. going to checking whether it is well going. another 
sentence starting with another."

sentences = re.split('[.!?]+', text)
result = []

for s in sentences:
    words = s.split()
    if len(words) > 0 and words[0] == words[-1]:
        result.append(s.strip())

print(result)

You could try using a backreference to reuse the match...您可以尝试使用反向引用来重用匹配...

import re

# sample text
text = "This is a sample sentence. Well, I'll check that things are going well. another sentence starting with another. ..."

print([match[0] for match in re.findall(r"((\b\w+\b)[^.?!]+\2[.?!])", text, re.IGNORECASE)])

This prints...这印...

['Well, I'll check that things are going well.', 'another sentence starting with another.']

Note: I changed the case of the first "well" to "Well" for testing purposes.注意:为了测试目的,我将第一个“well”的大小写更改为“Well”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM