如何在 Python 中使用正则表达式匹配和替换多个字符串

Question

I am trying to replace some text in Python with regex.我正在尝试用正则表达式替换 Python 中的一些文本。

My text looks like this:我的文字如下所示：

WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1

WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2

What I am trying to do is put the names in double square brackets and remove the IDs so that it will end up looking like this.我要做的是将名称放在方括号中并删除 ID，以便最终看起来像这样。

WORKGROUP 1. [[John Doe]], [[Jane Smith]], [[Ohe Keedoke]]
Situation paragraph 1

WORKGROUP 2. [[John Smith]], [[Jane Doe]]
Situation paragraph 2

So far I have this.到目前为止，我有这个。

re.sub(r"(WORKGROUP\s\d\.\s)",r"\1[[")
re.sub(r"(WORKGROUP\s\d\..+?)(?:\s\b\w+\b),(?:\s)(.+\n)",r"\1]], [[\2")
re.sub(r"(WORKGROUP\s\d\..+?)(?:\s\b\w+\b)(\n)",r"\1]]\2")

This works for groups with two people (WORKGROUP 2) but leaves all the IDs except the first and last persons' if there are more than two.这适用于有两个人的组（WORKGROUP 2），但如果有两个以上的人，则保留除第一个和最后一个人之外的所有 ID。 So WORKGROUP 1 ends up looking like this.所以 WORKGROUP 1 最终看起来像这样。

WORKGROUP 1. [[John Doe]], [[Jane Smith ID456, Ohe Keedoke]]
Situation paragraph 1

Unfortunately, I can't do something like不幸的是，我不能做类似的事情

re.sub(r"((\s\b\w+\b),(\s))+",r"\1]], [[\2")

because it will match inside the situation paragraphs.因为它将匹配情况段落内。

My question is: is it possible to do multiple match/replacements in a string segment without doing it universally?我的问题是：是否可以在一个字符串段中进行多个匹配/替换而不普遍进行？

Answer 1

If you have the regex module installed:如果您安装了regex模块：

(?<=\bWORKGROUP\s+\d+\.\s|,)\s*(.+?)\s*ID\d+\s*(?=,|$)

might work OK.可能工作正常。

If not, you can simply do that in your terminal, by running:如果没有，您可以在终端中简单地执行此操作，方法是运行：

$ pip install regex

or或者

$ pip3 install regex

Here, we're assuming that you might have other ID\d+ present in your text, otherwise, if you don't your problem would be much simple.在这里，我们假设您的文本中可能存在其他ID\d+ ，否则，如果您不这样做，您的问题将非常简单。

Test测试

import regex as re

regex = r"(?<=\bWORKGROUP\s+\d+\.\s|,)\s*(.+?)\s*ID\d+\s*(?=,|$)"

test_str = '''

WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1
WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2

WORKGROUP 11. Bob Doe ID123, Alice Doe ID123, John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1

WORKGROUP 21. John Smith ID321, Jane Doe ID654
Situation paragraph 2

'''


subst = "[[\\1]]"

print(re.sub(regex, subst, test_str, 0, re.MULTILINE))

Output Output

WORKGROUP 1. [[John Doe]],[[Jane Smith]],[[Ohe Keedoke]]
Situation paragraph 1
WORKGROUP 2. [[John Smith]],[[Jane Doe]]
Situation paragraph 2

WORKGROUP 11. [[Bob Doe]],[[Alice Doe]],[[John Doe]],[[Jane Smith]],[[Ohe Keedoke]]
Situation paragraph 1

WORKGROUP 21. [[John Smith]],[[Jane Doe]]
Situation paragraph 2

If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com .如果您想简化/修改/探索表达式，它已在regex101.com的右上角面板上进行了解释。 If you'd like, you can also watch in this link , how it would match against some sample inputs.如果您愿意，您还可以在此链接中观看它如何与一些示例输入匹配。

Answer 2

Code代码

import re

test = """
WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1

WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2
"""

test = re.sub(' ID[0-9]+, ', ']], [[', test)
test = re.sub('\. ', '. [[', test)
test = re.sub(' ID[0-9]+', ']]', test)
print(test)

Output Output

WORKGROUP 1. [[John Doe]], [[Jane Smith]], [[Ohe Keedoke]]
Situation paragraph 1

WORKGROUP 2. [[John Smith]], [[Jane Doe]]
Situation paragraph 2

Answer 3

You can nest the substitutions and make the first substitution find lines that start with WORKGROUP first, and then let the second substitution find and replace the common-separated tokens inside:您可以嵌套替换并使第一个替换首先查找以WORKGROUP开头的行，然后让第二个替换查找并替换其中的公共分隔标记：

re.sub(
    r'^(WORKGROUP\s+\d+\.\s*)(.*)',
    lambda m: m.group(1) + re.sub(r'([^,\s][^,]*)\s+\S+(?=,|$)', r'[[\1]]', m.group(2)),
    text,
    flags=re.MULTILINE
)

so that given:所以给出：

text = '''WORKGROUP 1. John Doe ID123, Jane Smith ID456, Ohe Keedoke ID7890
Situation paragraph 1

WORKGROUP 2. John Smith ID321, Jane Doe ID654
Situation paragraph 2'''

the expression returns:表达式返回：

WORKGROUP 1. [[John Doe]], [[Jane Smith]], [[Ohe Keedoke]]
Situation paragraph 1

WORKGROUP 2. [[John Smith]], [[Jane Doe]]
Situation paragraph 2

Demo: https://repl.it/@blhsing/BoldElderlyQuerylanguage演示： https://repl.it/@blhsing/BoldElderlyQuerylanguage

如何在 Python 中使用正则表达式匹配和替换多个字符串

问题描述

3 个解决方案

解决方案1
0 2019-10-20 20:28:49

Test测试

Output Output

解决方案2
0 2019-10-20 20:30:12

解决方案3
0 已采纳 2019-10-20 20:58:58

如何在 Python 中使用正则表达式匹配和替换多个字符串

问题描述

3 个解决方案

解决方案1 0 2019-10-20 20:28:49

Test测试

Output Output

解决方案2 0 2019-10-20 20:30:12

解决方案3 0 已采纳 2019-10-20 20:58:58

解决方案1
0 2019-10-20 20:28:49

解决方案2
0 2019-10-20 20:30:12

解决方案3
0 已采纳 2019-10-20 20:58:58