简体   繁体   中英

Remove data from XML file in DOM?

Is there an easy way (perhaps using the DOM api, or other) where I could remove the actual data from an XML file, leaving behind just a kind of template of its schema, so that we can see what potential information it can hold.

I will give an example, to make this clear.

Consider the users inputs the following xml file:

<photos page="2" pages="89" perpage="10" total="881">
    <photo id="2636" owner="47058503995@N01" 
        secret="a123456" server="2" title="test_04"
        ispublic="1" isfriend="0" isfamily="0" />
    <photo id="2635" owner="47058503995@N01"
        secret="b123456" server="2" title="test_03"
        ispublic="0" isfriend="1" isfamily="1" />
    <photo id="2633" owner="47058503995@N01"
        secret="c123456" server="2" title="test_01"
        ispublic="1" isfriend="0" isfamily="0" />
    <photo id="2610" owner="12037949754@N01"
        secret="d123456" server="2" title="00_tall"
        ispublic="1" isfriend="0" isfamily="0" />
</photos>

Then I want to transform this into:

<photos page=“..." pages=“..." perpage=“..." total=“...">
    <photo id=“.." owner=“.." 
        secret=“..." server=“..." title=“..."
        ispublic=“..." isfriend=“..." isfamily=“...” />
</photos>

I'm sure this could be written manually, but would be the be best, most efficient and reliable way of doing this. (preferably in Java).

Thnx!

Rather than use the DOM API, in which you'd have to iterate across the structure yourself, take a look at the SAX API, which iterates itself and calls you back for each element, text node etc. For each element you get called back for, you'll get the set of attributes too.

You'd still have to determine what to output, reduce duplicates etc. But you get a callback for an end-of-element as well, so perhaps record everything you get given, and then for your end-of-element callback, just determine the unique set of data you wish to output.

There are plenty of possibilities:

  • DOM API (included in JDK)
  • SAX API (included in JDK)
  • JDOM (easy to use, but external)
  • XSLT (transforming XML with prepared XSL stylesheet, JDK supports XSLT 1.0)

I think that XSLT is most reliable and universal way to transform XML into another XML. Here is some quick example:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:strip-space elements="*"/>
    <xsl:output method="xml" indent="yes" omit-xml-declaration="yes"/>

    <xsl:template match="node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()[position()=1]"/>
        </xsl:copy>     
    </xsl:template>

    <xsl:template match="@*">
        <xsl:attribute name="{name()}">...</xsl:attribute>
    </xsl:template>
</xsl:stylesheet>

Result:

<photos page="..." pages="..." perpage="..." total="...">
   <photo id="..." owner="..." secret="..." server="..." title="..." ispublic="..."
          isfriend="..."
          isfamily="..."/>
</photos>

There are heaps of XML parsers available that you can use to do this job. If you are interested in learning then try XmlBeans or JAXB. These APIs gives you great deal of control and validations. Plus you get to learn XSD and generation of java classes from XSD. Also parsing and writing into XML files is fairly easy with these APIs. Following are some useful links,

XmlBeans

JAXB 2.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM