Python正則表達式立即用組名替換組

Question

以下重新：

import re
s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)')
p.sub(r'\1',s)

結果是，

'the dog and cat wore 7 blue hats 9 days ago'

是否可以編寫一個 re.sub 使得：

import re
s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])')

結果是，

'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago"

奇怪的是，有關於替換字符串和獲取組名的文檔，但沒有很好的文檔記錄方法可以同時執行這兩項操作。

Answer 1

您可以將re.sub與返回matchobj.lastgroup的回調 matchobj.lastgroup ：

import re

s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
p = re.compile(r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])')

def callback(matchobj):
    return matchobj.lastgroup

result = p.sub(callback, s)
print(result)

產量

the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago

請注意，如果您使用 Pandas，則可以使用Series.str.replace ：

import pandas as pd

def callback(matchobj):
    return matchobj.lastgroup

df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9", 
                          "days ago"]})
pat = r'blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9])'
df['result'] = df['foo'].str.replace(pat, callback)
print(df)

產量

                        foo                                 result
0              the blue dog                             the animal
1  and blue cat wore 7 blue  and animal wore numberBelowSeven blue
2                    hats 9                    hats numberNotSeven
3                  days ago                               days ago

如果你有嵌套的命名組，你可能需要一個更復雜的回調，它通過matchobj.groupdict().items()迭代來收集所有相關的組名：

import pandas as pd

def callback(matchobj):
    names = [groupname for groupname, matchstr in matchobj.groupdict().items()
             if matchstr is not None]
    names = sorted(names, key=lambda name: matchobj.span(name))
    result = ' '.join(names)
    return result

df = pd.DataFrame({'foo':["the blue dog", "and blue cat wore 7 blue", "hats 9", 
                          "days ago"]})

pat=r'blue (?P<animal>dog|cat)|(?P<numberItem>(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'

# pat=r'(?P<someItem>blue (?P<animal>dog|cat)|(?P<numberBelowSeven>[0-7])|(?P<numberNotSeven>[8-9]))'

df['result'] = df['foo'].str.replace(pat, callback)
print(df)

產量

                        foo                                            result
0              the blue dog                                        the animal
1  and blue cat wore 7 blue  and animal wore numberItem numberBelowSeven blue
2                    hats 9                    hats numberItem numberNotSeven
3                  days ago                                          days ago

Answer 2

為什么不多次調用re.sub() ：

>>> s = re.sub(r"blue (dog|cat)", "animal", s)
>>> s = re.sub(r"\b[0-7]\b", "numberBelowSeven", s)
>>> s = re.sub(r"\b[8-9]\b", "numberNotSeven", s)
>>> s
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago'

然后就可以將其放入“變更列表”中，並一一應用：

>>> changes = [
...     (re.compile(r"blue (dog|cat)"), "animal"),
...     (re.compile(r"\b[0-7]\b"), "numberBelowSeven"),
...     (re.compile(r"\b[8-9]\b"), "numberNotSeven")
... ]
>>> s = "the blue dog and blue cat wore 7 blue hats 9 days ago"
>>> for pattern, replacement in changes:
...     s = pattern.sub(replacement, s)
... 
>>> s
'the animal and animal wore numberBelowSeven blue hats numberNotSeven days ago'

請注意，我還添加了單詞邊界檢查 ( \\b )。

Python正則表達式立即用組名替換組

問題描述

2 個解決方案

解決方案1
1 2016-04-29 18:01:37

解決方案2
0 2016-04-29 17:59:48

Python正則表達式立即用組名替換組

問題描述

2 個解決方案

解決方案1 1 2016-04-29 18:01:37

解決方案2 0 2016-04-29 17:59:48

解決方案1
1 2016-04-29 18:01:37

解決方案2
0 2016-04-29 17:59:48