[英]Appending to every regex match found in string [Python]
Meet string 's': 符合字符串“ s”:
AA Well, This is amazing.
AA:太好了。 AA It is.
AA是的。 BB Wow.
BB哇。 AA As I said.
AA正如我所说。
Here's the code: 这是代码:
s = AA Well, This is amazing. AA It is. BB Wow. AA As I said.
separate_words_tuple = s.split()
i = 0
for word in separate_words_tuple:
check_word = word
if re.search("[A-Z]+", str(check_word)):
match = re.search("[A-Z]+", str(check_word))
if (len(match.group(0)) != 1):
actor_name = match.group(0) + "_" + str(i)
print(re.sub(r"[A-Z]+", actor_name, s))
i += 1
Here's what 's' looks like with that code: 该代码的外观如下所示:
AA_3 AA_3ell, AA_3his is amazing.
AA_3 AA_3ell,AA_3his很棒。 AA_3 AA_3t is.
AA_3 AA_3t是。 AA_3 AA_3ow.
AA_3 AA_3ow。 AA_3 AA_3s AA_3 said.
AA_3 AA_3s AA_3表示。
Here's what I want 's' to look like: 这是我想要的样子:
AA_0 Well, This is amazing.
AA_0好的,这太神奇了。 AA_1 It is.
AA_1是的。 BB_2 Wow.
BB_2哇。 AA_3 As I said.
AA_3正如我所说。
Use function's static property to count all matched items. 使用函数的静态属性对所有匹配项进行计数。
incCaps()
is a replacement callback. incCaps()
是一个替换回调。
s = 'AA Well, This is amazing. AA It is. BB Wow. AA As I said.'
def incCaps(m):
replaced = m.group(0) + '_' + str(incCaps.counter)
incCaps.counter += 1
return replaced
incCaps.counter = 0
s = re.sub(r'\b[A-Z]{2,}\b', incCaps, s)
print(s)
The output: 输出:
AA_0 Well, This is amazing. AA_1 It is. BB_2 Wow. AA_3 As I said.
@RomanPerekhrest is right - re.sub
provides us with a great opportunity to pass a function as an argument. @RomanPerekhrest是正确的-
re.sub
为我们提供了一个很好的机会,将一个函数作为参数。 But having a global function with a state is a code smell - this function isn't testable or even reusable. 但是具有状态的全局函数是代码的味道-此函数不可测试甚至不可重用。
I would suggest two similar ways of refactoring: 我建议两种类似的重构方法:
Having match_counter
global for reuse: 具有
match_counter
全局可重用:
def match_counter(): counter = -1 def inc_caps(m): counter += 1 return '{}_{}'.format(m.group(0), counter) return inc_caps s = re.sub(r'\\b[AZ]{2,}\\b', match_counter(), s)
or making the whole thing a single unit of functionality: 或者将整个事情变成一个功能单元:
def replaced_by_counts(regex, string): counter = -1 def inc_caps(m): counter += 1 return '{}_{}'.format(m.group(0), counter) return regex.sub(inc_caps, string)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.