简体   繁体   English

附加到在字符串[Python]中找到的每个正则表达式匹配项

[英]Appending to every regex match found in string [Python]

Meet string 's': 符合字符串“ s”:

AA Well, This is amazing. AA:太好了。 AA It is. AA是的。 BB Wow. BB哇。 AA As I said. AA正如我所说。

Here's the code: 这是代码:

s = AA Well, This is amazing. AA It is. BB Wow. AA As I said.
separate_words_tuple = s.split()
i = 0
for word in separate_words_tuple:
    check_word = word
    if re.search("[A-Z]+", str(check_word)):
        match = re.search("[A-Z]+", str(check_word))
        if (len(match.group(0)) != 1):
            actor_name = match.group(0) + "_" + str(i)
            print(re.sub(r"[A-Z]+", actor_name, s))
            i += 1

Here's what 's' looks like with that code: 该代码的外观如下所示:

AA_3 AA_3ell, AA_3his is amazing. AA_3 AA_3ell,AA_3his很棒。 AA_3 AA_3t is. AA_3 AA_3t是。 AA_3 AA_3ow. AA_3 AA_3ow。 AA_3 AA_3s AA_3 said. AA_3 AA_3s AA_3表示。

Here's what I want 's' to look like: 这是我想要的样子:

AA_0 Well, This is amazing. AA_0好的,这太神奇了。 AA_1 It is. AA_1是的。 BB_2 Wow. BB_2哇。 AA_3 As I said. AA_3正如我所说。

Use function's static property to count all matched items. 使用函数的静态属性对所有匹配项进行计数。 incCaps() is a replacement callback. incCaps()是一个替换回调。

s = 'AA Well, This is amazing. AA It is. BB Wow. AA As I said.'

def incCaps(m):
    replaced = m.group(0) + '_' + str(incCaps.counter)
    incCaps.counter += 1
    return replaced

incCaps.counter = 0
s = re.sub(r'\b[A-Z]{2,}\b', incCaps, s)

print(s)

The output: 输出:

AA_0 Well, This is amazing. AA_1 It is. BB_2 Wow. AA_3 As I said.

@RomanPerekhrest is right - re.sub provides us with a great opportunity to pass a function as an argument. @RomanPerekhrest是正确的- re.sub为我们提供了一个很好的机会,将一个函数作为参数。 But having a global function with a state is a code smell - this function isn't testable or even reusable. 但是具有状态的全局函数是代码的味道-此函数不可测试甚至不可重用。

I would suggest two similar ways of refactoring: 我建议两种类似的重构方法:

  1. Having match_counter global for reuse: 具有match_counter全局可重用:

     def match_counter(): counter = -1 def inc_caps(m): counter += 1 return '{}_{}'.format(m.group(0), counter) return inc_caps s = re.sub(r'\\b[AZ]{2,}\\b', match_counter(), s) 
  2. or making the whole thing a single unit of functionality: 或者将整个事情变成一个功能单元:

     def replaced_by_counts(regex, string): counter = -1 def inc_caps(m): counter += 1 return '{}_{}'.format(m.group(0), counter) return regex.sub(inc_caps, string) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM