简体   繁体   中英

How to extract the text data between nodes from an XML file?

<?xml-stylesheet type="text/css" href="home.css"?>
<Header type="text">
  <encodingDesc>
    <samplingDesc>Samples taken from page 10-11,20-21,38-39, 54-55, 70-71, 80-81, 98-99, 122-123, 142-143, 148-149, 162-163, 174-175 </samplingDesc>
  </encodingDesc>
  <sourceDesc>
    <mainContent>
      <source> Abhinesh
        <category>Natural, Physical and Professional Sciences</category>
        <subcategory>Textile Technology</subcategory>
        <text> Book </text>
        <title> cloths </title>
        <vol> 1 </vol>
        <issue/>
      </source>
      <textDes>
        <type/>
        <headline/>
        <author> V. Nurjan </author>
        <translator/>
        <words>3364</words>
      </textDes>
    </mainContent>
  </sourceDesc>
  <profileDesc>
    <creation>
      <date> 21-Dec-2010 </date>
      <inputter> Abhinesh </inputter>
    </creation>
    <langUsage> Telugu </langUsage>
    <textClass>
      <channel mode="w"> print </channel>
      <domain type="public"/>
    </textClass>
  </profileDesc>
</Header>

I checked every example on the internet but they are only giving the code for simple XML files but not this type. How can I extract the tagged data from such an XML file?

You could use a simple XSL Transformation for your purpose. To extract all the texts as a text file you could make use of the following XSL stylesheet.

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:strip-space elements="*" />
    <xsl:output method="text" encoding="UTF-8" />

    <xsl:template match="node()">
        <xsl:if test="boolean(normalize-space(text()))">
            <xsl:value-of select="normalize-space(text())" /><xsl:text>&#xA;</xsl:text>
        </xsl:if>
        <xsl:apply-templates select="node()"/>
    </xsl:template>
</xsl:stylesheet>

To execute this stylesheet you would need an XSL Parser like Saxon or xsltproc if you use Unix like operating system.

You could also test it easily with IE, Firefox or any other browser you want.

Just save the stylesheet in the same folder your xml source file is. As for example test.xsl and then change the header of your xml file from

<?xml-stylesheet type="text/css" href="home.css"?>

to

<?xml-stylesheet type="text/xsl" href="test.xsl"?>

Then the output will look like that

Samples taken from page 10-11,20-21,38-39, 54-55, 70-71, 80-81, 98-99, 122-123, 142-143, 148-149, 162-163, 174-175
Abhinesh
Natural, Physical and Professional Sciences
Textile Technology
Book
cloths
1
V. Nurjan
3364
21-Dec-2010
Abhinesh
Telugu
print

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM