简体   繁体   中英

REGEX - Finding an specific XML tag and removing between between specific points

My xml looks like the following :

<example>
<Test_example>Author%5773637864827/Testing-75873874hdueu47.jpg</Test_example>
<Test_example>Auth0r%5773637864827/Testing245-75873874hdu6543u47.ts</Test_example>

This XML has 100 lines and i am interested in the tag " <Test_example> ". In this tag I want to remove everything until it sees a / and when it sees a - remove everything until it sees the full stop.

End result should be

<Test_example>Testing.jpg</Test_example>
<Test_example>Testing245.ts</Test_example>

I am a beginner and would love some help on this. I think maybe regex is the best method?

Consider XSLT , the special-purpose language designed to to transform XML files, using its substring-before and substring-after functions. Python's third-party module, lxml , can run XSLT 1.0 scripts. And because XSLT is portable, it can be run in other languages or executables beyond Python:

XSLT (save as .xsl file, a special .xml file)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" encoding="UTF-8"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="Test_example">
    <xsl:copy>
      <xsl:value-of select="concat(substring-before(substring-after(., '/'), '-'), 
                                   '.',
                                   substring-after(., '.'))"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Python

import lxml.etree as et

xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')

transformer = et.XSLT(xsl)
new_xml = transformer(xml)

# PRINT TO CONSOLE
print(new_xml)

# SAVE TO FILE
with open('Output.xml', 'wb') as f:
   f.write(new_xml)

Output

<?xml version="1.0" encoding="UTF-8"?>
<example>
   <Test_example>Testing.jpg</Test_example>
   <Test_example>Testing245.ts</Test_example>
</example>

Python Demo

XSLT Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM