My xml looks like the following :
<example>
<Test_example>Author%5773637864827/Testing-75873874hdueu47.jpg</Test_example>
<Test_example>Auth0r%5773637864827/Testing245-75873874hdu6543u47.ts</Test_example>
<newtag>
This XML has 100 lines and i am interested in the tag " <Test_example>
". In this tag I want to remove everything until it sees a /
and when it sees a -
remove everything until it sees the full stop.
End result should be
<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>
I am a beginner and would love some help on this. I need a regex soloution. The code i have running before this is a find and replace like follows.
new = open('test.xml')
with open('test.xml', 'r') as f:
onw = f.read().replace('new:', 'ext:')
Based on your sample data I came up with the following regex and this is how I tested it.
import re
example_string = """<example>
<Test_example>Author%5773637864827/Testing-75873874hdueu47.jpg</Test_example>
<Test_example>Auth0r%5773637864827/Testing245-75873874hdu6543u47.ts</Test_example>
<newtag>"""
my_list = example_string.split('\n')
my_regex = re.compile('(<Test_example>)\S+%\d+/(\S+)-\S+(\.\S+)(</Test_example>)')
for line in my_list:
if re.search(my_regex, line):
match = re.search(my_regex, line)
print(match.group(1) + match.group(2) + match.group(3) + match.group(4))
Output:
<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.