Python Regex：匹配前面或后面没有带数字的单词的字符串

Question

我想在 Python 中使用正则表达式来替换前面或后面没有带数字的单词的字符串。

IE

对于下面的句子，

今天是4月4日。 她的名字是四月。 明天是 4 月 5 日。

我只想匹配四月（粗体）并将其替换为“人”，结果应如下所示：

今天是4月4日。 她的名字是的人。 明天是 4 月 5 日。

我尝试使用这个正则表达式：

(\w*(?<!\w*\d\w*\s)April(?!\s\w*\d\w*))

但是，我有一个错误说：

error: look-behind requires fixed-width pattern

任何帮助表示赞赏。

Answer 1

这是您可以使用的一种正则表达式：

(?:^\s+|[^\w\s]+\s*|\b[^\d\s]+\s+)(April)\b(?!\s*\w*\d)

设置大小写无关标志。 目标词在捕获组 1 中捕获。

演示

Python 的正则表达式引擎执行以下操作：

(?:           # begin non-cap grp
  ^           # match beginning of line
  \s*         # match 0+ whitespace characters
  |           # or
  [^\w\s]+    # match 1+ chars other than word chars and whitespace
  \s*         # match 0+ whitespace chars
  |           # or
  \b          # match word break
  [^\d\s]+    # match 1+ chars other than digits and whitespace
  \s+         # match 1+ whitespace chars
)             # end non-cap grp  
(April)       # match 'April' in capture group
\b            # match word break
(?!           # begin negative lookahead
  \s*         # match 0+ whitespace chars         
  \w*         # match 0+ word chars
  \d          # match a digit
)             # end negative lookahead

我所采取的方法是指定什么可能在"April"之前以及为什么不能紧随其后。 我无法指定什么不能在"April"之前，因为这需要负向后视，而 Python 的正则表达式引擎不支持这种情况。

我假设"April"可能会：

位于字符串的开头，可能后跟空格；
前面是一个既不是单词字符也不是空格的字符，后面可能有空格； 或者
前面是一个不包含数字的单词，后面可能跟空格。

我还假设"April"后面跟着一个分词，后面可能没有包含数字的单词，前面可能有空格。

Answer 2

可以使用支持可变长度后视的 Pypi 正则表达式库来完成。

import regex

str = 'Today is 4th April. Her name is April. Tomorrow is April 5th.'
res = regex.sub(r'(?<!\d[a-z]* )April(?! [a-z]*\d)', 'PERSON', str)
print(res)

输出：

Today is 4th April. Her name is PERSON. Tomorrow is April 5th.

解释：

(?<!\d[a-z]* )      # negative lookbehind, make sure we haven't a digit followed by 0 or more letters and a space before
April               # literally
(?! [a-z]*\d)       # negative lookahead, make sure we haven't a space, 0 or more letters and a digit after

使用re模块更新：

import re

str = 'Today is 4th April. Her name is April. Tomorrow is April 5th.'
res = re.sub(r'(\b[a-z]+ )April(?! [a-z]*\d)', '\g<1>PERSON', str)
print(res)

Python Regex：匹配前面或后面没有带数字的单词的字符串

问题描述

2 个解决方案

解决方案1
1 2020-03-30 01:25:08

解决方案2
1 2020-03-30 10:14:18

Python Regex：匹配前面或后面没有带数字的单词的字符串

问题描述

2 个解决方案

解决方案1 1 2020-03-30 01:25:08

解决方案2 1 2020-03-30 10:14:18

解决方案1
1 2020-03-30 01:25:08

解决方案2
1 2020-03-30 10:14:18