简体   繁体   中英

REGEX - Finding an specific XML tag and parsing through it

My xml looks like the following :

<example>
<Test_example>Author%5773637864827/Testing-75873874hdueu47.jpg</Test_example>
<Test_example>Auth0r%5773637864827/Testing245-75873874hdu6543u47.ts</Test_example>
<newtag>

This XML has 100 lines and i am interested in the tag " <Test_example> ". In this tag I want to remove everything until it sees a / and when it sees a - remove everything until it sees the full stop.

End result should be

<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>

I am a beginner and would love some help on this. I need a regex soloution. The code i have running before this is a find and replace like follows.

new = open('test.xml')

with open('test.xml', 'r') as f:
    onw = f.read().replace('new:', 'ext:')

Based on your sample data I came up with the following regex and this is how I tested it.

import re

example_string = """<example>
<Test_example>Author%5773637864827/Testing-75873874hdueu47.jpg</Test_example>
<Test_example>Auth0r%5773637864827/Testing245-75873874hdu6543u47.ts</Test_example>
<newtag>"""

my_list = example_string.split('\n')

my_regex = re.compile('(<Test_example>)\S+%\d+/(\S+)-\S+(\.\S+)(</Test_example>)')

for line in my_list:
    if re.search(my_regex, line):
        match = re.search(my_regex, line)
        print(match.group(1) + match.group(2) + match.group(3) + match.group(4))

Output:

<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM