简体   繁体   English

python正则表达式替换匹配组

[英]python regular expression substitution with matched group

I'm trying to substitue the channel name for AndroidManifest.xml to batch generate a groups of channel apk packages for release. 我正在尝试替换AndroidManifest.xml的频道名称,以批量生成一组要发布的频道apk包。

<meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/> from an xml file. xml文件中的<meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/>

The channel configs are saved in a config file, sth like: 通道配置保存在配置文件中,例如:

channel_name    output_postfix  valid 
"androidmarket" "androidmarket" true

Here is what I tried: 这是我尝试过的:

manifest_original_xml_fh = open("../AndroidManifest_original.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
pattern = re.compile('<meta-data\sandroid:value=\"(.*)\"\sandroid:name=\"UMENG_CHANNEL\".*')
for each_config_line in manifest_original_xml_fh:
    each_config_line = re.sub(pattern, channel_name, each_config_line) 
    print each_config_line

It replaces the whole <meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/> to androidmarket which is obviously not my need. 它将整个<meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/>androidmarket ,这显然不是我的需要。 Then I figured out the problem is that pattern.match(each_config_line) return a match result ,and one of the result group is "CHANNEL_NAME_TO_BE_DETERMINED". 然后我发现问题是pattern.match(each_config_line)返回一个匹配结果,结果组之一是“ CHANNEL_NAME_TO_BE_DETERMINED”。 I've also tried to give some replace implementation function, but still failed. 我也尝试提供一些替换实现功能,但仍然失败。

So, since I've successfully find the pattern, how can I replace the matched result group element correctly? 因此,既然我已经成功找到了模式,那么如何正确替换匹配的结果组元素?

I suggest a different approach: save your xml as a template, with placeholders to be replaced with standard Python string operations. 我建议采用另一种方法:将xml保存为模板,并使用标准Python字符串操作替换占位符。

Eg 例如

AndroidManifest_template.xml:

<meta-data android:value="%(channel_name)s" android:name="UMENG_CHANNEL"/>

python:

manifest_original_xml_fh = open("../AndroidManifest_template.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
for each_config_line in manifest_original_xml_fh:
    each_config_line = each_config_line % {'channel_name': channel_name}
    print each_config_line

To capture just the value of the meta-data tag you need to change the regex: 要仅捕获元数据标记的值,您需要更改正则表达式:

<meta-data\sandroid:value=\"([^"]*)\"\sandroid:name=\"UMENG_CHANNEL\".*

Specifically I changed this part: 具体来说,我更改了这一部分:

\\"(.*)\\" - this is a greedy match, so it will go ahead and match as many characters as possible as long as the rest of the expression matches \\"(.*)\\" -这是一个贪婪的匹配,因此只要表达式的其余部分匹配,它将继续匹配尽可能多的字符

to

\\"([^"]*)\\" - which will match anything that's not the double quote. The matching result will still be in the first capturing group \\"([^"]*)\\" -将匹配所有不是双引号的内容。匹配结果仍将在第一个捕获组中

If you want to do the replace thing, a better idea might be to capture what you want to stay the same - I'm not a python expert but something like this would probably work: 如果您要执行替换操作,一个更好的主意可能是捕获要保持不变的内容-我不是python专家,但是类似的事情可能会起作用:

re.sub(r'(<meta-data\sandroid:value=\")[^"]*(\"\sandroid:name=\"UMENG_CHANNEL\".*)'
, r'\1YourNewValue\2', s)

\\1 is backreference 1 - ie it gets what the first capturing group matched \\1是反向引用1-即它获取第一个捕获组匹配的对象

I think your misunderstanding is, everything that has been matched will be replaced. 我认为您的误解是,已匹配的所有内容都将被替换。 If you want to keep stuff from the pattern, you have to capture it and reinsert it in the replacement string. 如果要保留模式中的内容,则必须捕获它并将其重新插入替换字符串中。

Or match only what you want to replace by using lookaround assertions 或通过使用环视断言仅匹配您要替换的内容

Try this 尝试这个

pattern = re.compile('(?<=<meta-data\sandroid:value=\")[^"]+')
for each_config_line in manifest_original_xml_fh:
    each_config_line = re.sub(pattern, channel_name, each_config_line)

(?<=<meta-data\\sandroid:value=\\") is a positive lookbehind assertion, it ensures that this text is before, but does not match it (so it will not be replaced) (?<=<meta-data\\sandroid:value=\\")是肯定的后置断言,它确保此文本在之前,但不匹配(因此将不会被替换)

[^"]+ will then match anything that is not a " 然后[^"]+会匹配非"

See it here on Regexr 在Regexr上查看

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM