如何使用正则表达式搜索并避免列表中的条目？

Question

我在一个文件中有很长的条目列表，格式如下：

<space><space><number><space>"<word/phrase/sentence>"

例如

12345 = "Section 3 is ready for review"

24680 = "Bob to review Chapter 4"

我需要找到一种在单词/短语/句子开头插入附加文本的方法，但前提是它不以几个关键词之一开头。

附加文本： 'Complete: '

关键词列表： key_words_list = ['Section', 'Page', Heading']

例如

12345 = "Section 3 is ready for review" （无需更改 - 句子以列表中的“部分”开头）

24680 = "Complete: Bob to review Chapter 4" （“完成：”添加到句子的开头，因为第一个单词不在列表中）

这可以通过大量的字符串拆分和if语句来完成，但正则表达式似乎应该是一个更简洁和更整洁的解决方案。 我有以下不考虑列表的内容：

for line in lines:
    line = re.sub('(^\s\s[0-9]+\s=\s")', r'\1Complete: ', line)

我还有一些代码可以识别需要更改的行：

print([w for w in re.findall('^\s\s[0-9]+\s=\s"([\w+=?\s?,?.?]+)"', line) if w not in key_words_list])

正则表达式是我需要的最佳选择吗？如果是，我错过了什么？

示例输入：

12345 = "Section 3 is ready for review"

24680 = "Bob to review Chapter 4"

示例输出：

12345 = "Section 3 is ready for review"

24680 = "Complete: Bob to review Chapter 4"

Answer 1

您可以使用正则表达式

^\s{2}[0-9]+\s=\s"(?!(?:Section|Page|Heading)\b)

请参阅正则表达式演示。 详情：

^ - 字符串的开头
\s{2} - 两个空格
[0-9]+ - 一位或多位数字
\s=\s - a =两端用一个空格括起来
" - 一个"字符
(??(::Section|Page|Heading)\b) - 如果当前位置的右侧有Section 、 Page或Heading整个单词，则匹配失败。

请参阅Python 演示：

import re
texts = ['  12345 = "Section 3 is ready for review"', '  24680 = "Bob to review Chapter 4"']
add = 'Complete: '
key_words_list = ['Section', 'Page', 'Heading']
pattern = re.compile(fr'^\s{{2}}[0-9]+\s=\s"(?!(?:{"|".join(key_words_list)})\b)')
for text in texts:
    print(pattern.sub(fr'\g<0>{add}', text))

# =>   12345 = "Section 3 is ready for review"
#      24680 = "Complete: Bob to review Chapter 4"

如何使用正则表达式搜索并避免列表中的条目？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-26 19:27:14

如何使用正则表达式搜索并避免列表中的条目？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-26 19:27:14

解决方案1
1 已采纳 2021-04-26 19:27:14