我可以编写一个匹配模式的正则表达式，并且该模式的一部分是反向匹配吗？

Question

I want to write a RegEx to remove ellipses from a large text.我想编写一个正则表达式来删除大文本中的省略号。

I need to find a series of two or more dots, possibly with spaces between them, possibly without.我需要找到一系列两个或多个点，它们之间可能有空格，也可能没有。 The RegEx I'm using is finding instances of full stops which I don't want to remove, so I want part of the RegEx pattern to negate the pattern if it's followed by a particular string.我正在使用的 RegEx 正在查找我不想删除的句号的实例，所以我希望 RegEx 模式的一部分在它后面跟着一个特定的字符串时否定该模式。

I've been using this pattern: re.compile(r'\.[ \.]*\.')我一直在使用这种模式： re.compile(r'\.[ \.]*\.')

The problem with this is that there are some legitimate abbreviations in the text which are being caught by this.这样做的问题是文本中有一些合法的缩写被此捕获。

Take this text for example:以这段文字为例：

1. Here are ... some . . ellipses..
2. This. . .is ellipsis also.
3. Here is an abbreviation. .i.

In the example above, I want my pattern to find only the ... , . .在上面的例子中，我希望我的模式只找到... , . . . . , .. , and . . . , ..和. . . . . . in lines 1 and 2. I don't want it to find anything in line 3, however, it will find . .在第 1 行和第 2 行中。我不希望它在第 3 行中找到任何内容，但是，它会找到. . . . in it.在里面。

I could update the RegEx to exclude patterns if they're preceded or followed by the letter i like this: re.compile(r'[^i]\.[ \.]*\.'[^i]) but then the pattern won't find the ellipsis in line 2.我可以更新 RegEx 以排除模式，如果它们之前或之后是这样的字母i ： re.compile(r'[^i]\.[ \.]*\.'[^i])但随后模式不会在第 2 行中找到省略号。

Ideally I'd be able to negate a whole sub-string within the pattern so that it won't think . .理想情况下，我可以否定模式中的整个子字符串，这样它就不会认为. . . . is ellipsis if it's followed by i.如果后面跟着i. or preceded by .i , however, I haven't been able to find any way to do this.或前面有.i ，但是，我无法找到任何方法来做到这一点。 Is it possible?可能吗？

Answer 1

Use negative look ahead and negative look behind:使用负面展望和负面展望：

import re

text = """
1. Here are ... some . . ellipses..
2. This. . .is ellipsis also.
3. Here is an abbreviation. .i.
"""

pattern = re.compile(r'(?<!\.i)\.[ \.]*\.(?!i\.)')
print(pattern.findall(text))   # ['...', '. .', '..', '. . .']
print(pattern.sub('', text))

Text after removing .删除后的文本. sequence:序列：

1. Here are  some  ellipses
2. Thisis ellipsis also.
3. Here is an abbreviation. .i.

avoid sequence of .避免. followed by i.其次是i. you must include another character with i to handle this case:您必须在i中包含另一个字符才能处理这种情况：

     . . .is

我可以编写一个匹配模式的正则表达式，并且该模式的一部分是反向匹配吗？

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-10-08 13:17:38

我可以编写一个匹配模式的正则表达式，并且该模式的一部分是反向匹配吗？

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-10-08 13:17:38

解决方案1
2 已采纳 2019-10-08 13:17:38