我可以在python中用几个正则表达式的并集扩展单个字符串吗？

Question

我正在破解一个可以转换文件类型的程序包，允许用户指定转换（python函数）和用于更改文件名的正则表达式。

在一种情况下，我有一系列正则表达式和一个输出字符串，我希望通过所有正则表达式组的并集来对其进行扩展：

import re
re_strings = ['(.*).txt', '(.*).ogg', 'another(?P<name>.*)']
regexes = map(re.compile, re_strings]
input_files = ['cats.txt', 'music.ogg', 'anotherpilgrim.xls']
matches = [regexes[i].match(input_files[i]) for i in range(len(regexes))]

outputstr = 'Text file about: \1, audio file about: \2, and another file on \g<name>.'
# should be 'Text file about: cats, audio file about: music, and another file on pilgrim.xls'

我想用正则表达式的并集来扩展outputstr （也许对于\\2引用来说，串联更有意义吗？）。 我可以将re连接起来，并用一些未使用的字符将它们分开：

final_re = re.compile('\n'.join(re_strings))
final_files = '\n'.join(input_files)
match = final_re.search(final_files)

但这迫使re匹配整个文件，而不仅仅是文件名的一部分。 我可以在文件la (.*?)之间放置一个包罗万象的组，但这肯定会弄乱组引用，并且可能会弄乱原始模式（我无法控制）。 我想我也可以在任何地方强制命名组，然后合并所有正则表达式.groupdict（）s ...

Python不允许部分扩展，因此所有组引用都必须有效，因此无论如何都不可能对groupdict进行一系列扩展，例如：

for m in matches:
    outputstr = m.expand(outputstr)

感谢您的任何建议！

Answer 1

仅作记录，这里是如何组合多个正则表达式结果的结果，并在所有结果之间进行替换。

给定几个查询字符串和几个正则表达式匹配项：

import re

query_str = ["abcdyyy", "hijkzzz"]
re_pattern = [r"(a)(b)(?P<first_name>c)(d)",
              r"(h)(i)(?P<second_name>j)(k)"]

# match each separately
matches= [re.search(p,q) for p,q in 
          zip(re_pattern, query_str)]

我们要创建一个替换字符串，将所有搜索的结果结合起来：

replacement = r"[\4_\g<first_name>_\2_\1:\5:\6:\8:\g<second_name>]"

为此，我们需要：

合并搜索结果
用代理代替合并结果（match_substitute）
有一个代理对象来处理命名组，例如“ first_name”（pattern_substitute）

这由以下代码处理。 结果在“结果”中：

import sre_parse

#
#   dummy object to provide group() function and "string" member variable
#
class match_substitute:
    def __init__(self, matches): 
        self.data = []
        for m in matches:
            self.data.extend(m.groups())
        self.string = ""
    # regular expression groups count from 1 not from zero!
    def group(self, ii):
        return self.data[ii - 1]



#
#   dummy object to provide groupindex dictionary for named groups
#
class pattern_substitute:
    def __init__(self, matches, re_pattern): 
        #
        #   Named group support
        #   Increment indices so they are point to the correct position
        #       in the merged list of matching groups
        #
        self.groupindex = dict()
        offset = 0
        for p, m in zip(re_pattern, matches):
            for k,v in sre_parse.parse(p).pattern.groupdict.iteritems():
                self.groupindex[k] = v + offset
            offset += len(m.groups())



match   = match_substitute(matches)
pattern = pattern_substitute(matches, re_pattern)

#
#   parse and substitute
#
template = sre_parse.parse_template(replacement, pattern)
result = sre_parse.expand_template(template, match)

我可以在python中用几个正则表达式的并集扩展单个字符串吗？

问题描述

1 个解决方案

解决方案1
0 2013-01-09 11:46:09

我可以在python中用几个正则表达式的并集扩展单个字符串吗？

问题描述

1 个解决方案

解决方案1 0 2013-01-09 11:46:09

解决方案1
0 2013-01-09 11:46:09