[英]Match everything delimited by another regex?
I'm currently trying to make a regex that will find all the sentences in a block of text, and so far I've got this; 我目前正在尝试制作一个正则表达式,以便在一段文本中查找所有句子,到目前为止,我已经掌握了这一点。
(?=(?<!mr)\.|(?<!mrs)\.|\?|!)+
Which will find everything that delimits a sentence. 这将发现, 界定一个句子的一切。 I want the regex to find everything that's contained between what this regex finds, but I don't really know where to go from here.
我希望正则表达式能够找到该正则表达式所包含内容之间的所有内容,但是我真的不知道从这里去哪里。
What about this: 那这个呢:
import re
pattern = r'(?=(?<!mr)\.|(?<!mrs)\.|\?|!)+' # I'm assuming this does what you say it does :)
text_block = """long block of sentences"""
sentences = re.split(pattern, text_block)
sentences
will be a list containing the resulting substrings. sentences
将是包含结果子字符串的列表。 re.split
will split text_block
up into different elements of the returned list
. re.split
将把text_block
拆分成返回list
不同元素。 It splits at each point where pattern
matches. 它在
pattern
匹配的每个点处拆分。
Read about re here: 在这里阅读有关re的信息:
https://docs.python.org/2/howto/regex.html https://docs.python.org/2/howto/regex.html
EDIT(data imported from your closed newer question): 编辑(从封闭的较新问题中导入的数据):
If you are getting the symbols like ?, ! 如果您收到诸如?,!之类的符号 etc. captured into your returned list aswell, you should try removing the outer parens, like this:
等等。您还应尝试删除外部括号,如下所示:
re.split(r"\.(?<!mr)|\.(?<!mrs)|\?|!", somestring)
Ex: 例如:
sentences = [s for s in re.split(r"\.(?<!mr)|\.(?<!mrs)|\?|!", somestring) if s]
(Moved from your closed newer question) (从您已关闭的较新问题中删除)
In your case, the lookbehinds should come before the periods. 对于您而言,后向应该早于句点。
Condensing your expression, it is 浓缩你的表情
Update - Between it you could just split discarding delimiters 更新 -在它们之间,您可以拆分丢弃定界符
# (?:(?<!mr)(?<!mrs)\.|\?|!)+
(?:
(?<! mr )
(?<! mrs )
\.
| \?
| !
)+
Or, split keeping delimiters 或者,分割保留定界符
# ((?:(?<!mr)(?<!mrs)\.|\?|!)+)
(
(?:
(?<! mr )
(?<! mrs )
\.
| \?
| !
)+
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.