繁体   English   中英

python中的正则表达式:是否可以获得匹配,替换和最终字符串?

[英]Regex in python: is it possible to get the match, replacement, and final string?

要进行正则表达式替换,您可以使用以下三种方法:

  • 比赛模式
  • 替换模式
  • 原始字符串

正则表达式引擎发现了我感兴趣的三件事:

  • 匹配的字符串
  • 替换字符串
  • 最终处理过的字符串

使用re.sub ,最后一个字符串是返回的。 但是有可能访问其他两个东西,匹配的字符串和替换字符串?

这是一个例子:

orig = "This is the original string."
matchpat = "(orig.*?l)"
replacepat = "not the \\1"

final = re.sub(matchpat, replacepat, orig)
print(final)
# This is the not the original string

匹配字符串是"original" ,替换字符串是"not the original" 有办法获得它们吗? 我正在编写一个脚本来搜索和替换许多文件,我希望它能够打印出它正在查找和替换的内容,而不打印整行。

class Replacement(object):

    def __init__(self, replacement):
        self.replacement = replacement
        self.matched = None
        self.replaced = None

    def __call__(self, match):
        self.matched = match.group(0)
        self.replaced = match.expand(self.replacement)
        return self.replaced

>>> repl = Replacement('not the \\1')
>>> re.sub('(orig.*?l)', repl, 'This is the original string.')
    'This is the not the original string.'
>>> repl.matched
    'original'
>>> repl.replaced
    'not the original'

编辑:正如@FJ指出的那样,上面只会记住最后的匹配/替换。 此版本处理多次出现:

class Replacement(object):

    def __init__(self, replacement):
        self.replacement = replacement
        self.occurrences = []

    def __call__(self, match):
        matched = match.group(0)
        replaced = match.expand(self.replacement)
        self.occurrences.append((matched, replaced))
        return replaced

>>> repl = Replacement('[\\1]')
>>> re.sub('\s(\d)', repl, '1 2 3')
    '1[2][3]'

>>> for matched, replaced in repl.occurrences:
   ....:     print matched, '=>', replaced
   ....:     
 2 => [2]
 3 => [3]

我查看了文档,看起来你可以将函数引用传递给re.sub

import re

def re_sub_verbose(pattern, replace, string):
  def substitute(match):
    print 'Matched:', match.group(0)
    print 'Replacing with:', match.expand(replace)

    return match.expand(replace)

  result = re.sub(pattern, substitute, string)
  print 'Final string:', result

  return result

我在运行re_sub_verbose("(orig.*?l)", "not the \\\\1", "This is the original string.")时得到此输出re_sub_verbose("(orig.*?l)", "not the \\\\1", "This is the original string.")

Matched: original
Replacing with: not the original
This is the not the original string.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM