简体   繁体   English

匹配由另一个正则表达式分隔的所有内容?

[英]Match everything delimited by another regex?

I'm currently trying to make a regex that will find all the sentences in a block of text, and so far I've got this; 我目前正在尝试制作一个正则表达式,以便在一段文本中查找所有句子,到目前为止,我已经掌握了这一点。

(?=(?<!mr)\.|(?<!mrs)\.|\?|!)+

Which will find everything that delimits a sentence. 这将发现, 界定一个句子的一切。 I want the regex to find everything that's contained between what this regex finds, but I don't really know where to go from here. 我希望正则表达式能够找到该正则表达式所包含内容之间的所有内容,但是我真的不知道从这里去哪里。

What about this: 那这个呢:

import re

pattern = r'(?=(?<!mr)\.|(?<!mrs)\.|\?|!)+' # I'm assuming this does what you say it does :)
text_block = """long block of sentences"""

sentences = re.split(pattern, text_block)

sentences will be a list containing the resulting substrings. sentences将是包含结果子字符串的列表。 re.split will split text_block up into different elements of the returned list . re.split将把text_block拆分成返回list不同元素。 It splits at each point where pattern matches. 它在pattern匹配的每个点处拆分。

Read about re here: 在这里阅读有关re的信息:

https://docs.python.org/2/howto/regex.html https://docs.python.org/2/howto/regex.html

EDIT(data imported from your closed newer question): 编辑(从封闭的较新问题中导入的数据):

If you are getting the symbols like ?, ! 如果您收到诸如?,!之类的符号 etc. captured into your returned list aswell, you should try removing the outer parens, like this: 等等。您还应尝试删除外部括号,如下所示:

re.split(r"\.(?<!mr)|\.(?<!mrs)|\?|!", somestring)

Ex: 例如:

sentences = [s for s in re.split(r"\.(?<!mr)|\.(?<!mrs)|\?|!", somestring) if s]

(Moved from your closed newer question) (从您已关闭的较新问题中删除)
In your case, the lookbehinds should come before the periods. 对于您而言,后向应该早于句点。
Condensing your expression, it is 浓缩你的表情

Update - Between it you could just split discarding delimiters 更新 -在它们之间,您可以拆分丢弃定界符

 # (?:(?<!mr)(?<!mrs)\.|\?|!)+

 (?:
      (?<! mr )
      (?<! mrs )
      \.
   |  \?
   |  !
 )+

Or, split keeping delimiters 或者,分割保留定界符

 # ((?:(?<!mr)(?<!mrs)\.|\?|!)+)

 (
      (?:
           (?<! mr )
           (?<! mrs )
           \.
        |  \?
        |  !
      )+
 )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM