简体   繁体   中英

python regular expression substitution with matched group

I'm trying to substitue the channel name for AndroidManifest.xml to batch generate a groups of channel apk packages for release.

<meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/> from an xml file.

The channel configs are saved in a config file, sth like:

channel_name    output_postfix  valid 
"androidmarket" "androidmarket" true

Here is what I tried:

manifest_original_xml_fh = open("../AndroidManifest_original.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
pattern = re.compile('<meta-data\sandroid:value=\"(.*)\"\sandroid:name=\"UMENG_CHANNEL\".*')
for each_config_line in manifest_original_xml_fh:
    each_config_line = re.sub(pattern, channel_name, each_config_line) 
    print each_config_line

It replaces the whole <meta-data android:value="CHANNEL_NAME_TO_BE_DETERMINED" android:name="UMENG_CHANNEL"/> to androidmarket which is obviously not my need. Then I figured out the problem is that pattern.match(each_config_line) return a match result ,and one of the result group is "CHANNEL_NAME_TO_BE_DETERMINED". I've also tried to give some replace implementation function, but still failed.

So, since I've successfully find the pattern, how can I replace the matched result group element correctly?

I suggest a different approach: save your xml as a template, with placeholders to be replaced with standard Python string operations.

Eg

AndroidManifest_template.xml:

<meta-data android:value="%(channel_name)s" android:name="UMENG_CHANNEL"/>

python:

manifest_original_xml_fh = open("../AndroidManifest_template.xml", "r")
manifest_xml_fh = open("../AndroidManifest.xml", "w")
for each_config_line in manifest_original_xml_fh:
    each_config_line = each_config_line % {'channel_name': channel_name}
    print each_config_line

To capture just the value of the meta-data tag you need to change the regex:

<meta-data\sandroid:value=\"([^"]*)\"\sandroid:name=\"UMENG_CHANNEL\".*

Specifically I changed this part:

\\"(.*)\\" - this is a greedy match, so it will go ahead and match as many characters as possible as long as the rest of the expression matches

to

\\"([^"]*)\\" - which will match anything that's not the double quote. The matching result will still be in the first capturing group

If you want to do the replace thing, a better idea might be to capture what you want to stay the same - I'm not a python expert but something like this would probably work:

re.sub(r'(<meta-data\sandroid:value=\")[^"]*(\"\sandroid:name=\"UMENG_CHANNEL\".*)'
, r'\1YourNewValue\2', s)

\\1 is backreference 1 - ie it gets what the first capturing group matched

I think your misunderstanding is, everything that has been matched will be replaced. If you want to keep stuff from the pattern, you have to capture it and reinsert it in the replacement string.

Or match only what you want to replace by using lookaround assertions

Try this

pattern = re.compile('(?<=<meta-data\sandroid:value=\")[^"]+')
for each_config_line in manifest_original_xml_fh:
    each_config_line = re.sub(pattern, channel_name, each_config_line)

(?<=<meta-data\\sandroid:value=\\") is a positive lookbehind assertion, it ensures that this text is before, but does not match it (so it will not be replaced)

[^"]+ will then match anything that is not a "

See it here on Regexr

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM