My xml looks like the following :
<example>
<Test_example>Author%5773637864827/Testing-75873874hdueu47.jpg</Test_example>
<Test_example>Auth0r%5773637864827/Testing245-75873874hdu6543u47.ts</Test_example>
This XML has 100 lines and i am interested in the tag " <Test_example>
". In this tag I want to remove everything until it sees a /
and when it sees a -
remove everything until it sees the full stop.
End result should be
<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>
I am a beginner and would love some help on this. I think maybe regex is the best method?
Consider XSLT , the special-purpose language designed to to transform XML files, using its substring-before
and substring-after
functions. Python's third-party module, lxml
, can run XSLT 1.0 scripts. And because XSLT is portable, it can be run in other languages or executables beyond Python:
XSLT (save as .xsl file, a special .xml file)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="UTF-8"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="Test_example">
<xsl:copy>
<xsl:value-of select="concat(substring-before(substring-after(., '/'), '-'),
'.',
substring-after(., '.'))"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Python
import lxml.etree as et
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')
transformer = et.XSLT(xsl)
new_xml = transformer(xml)
# PRINT TO CONSOLE
print(new_xml)
# SAVE TO FILE
with open('Output.xml', 'wb') as f:
f.write(new_xml)
Output
<?xml version="1.0" encoding="UTF-8"?>
<example>
<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>
</example>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.