简体   繁体   中英

Java: Having trouble parsing XML with nested nodes

I have an XML file with something like this

<album>
    <title> Sample Album </title>
    <year> 2014 </year>
    <musicalStyle> Waltz </musicalStyle>
        <song> Track 1 </song>
        <song> Track 2 </song>
        <song> Track 3 </song>
        <song> Track 4 </song>
        <song> Track 5 </song>
        <song> Track 6 </song>
        <song> Track 7 </song>
</album>

I was able to parse the song by following a tutorial but now I'm stuck with the nested nodes. Song.XMLtitleStartTag = <title> and the end tag being </title>

public static SongList parseFromFile(File inputFile){
    System.out.println("Parse File Data:");     
    if(inputFile == null) return null;      
    SongList theSongs  = new SongList();        
    BufferedReader inputFileReader;

    String inputLine; //current input line
    try{
           inputFileReader= new BufferedReader(new FileReader(inputFile));

           while((inputLine = inputFileReader.readLine()) != null){
               if(inputLine.trim().startsWith(Song.XMLtitleStartTag) && 
                   inputLine.endsWith(Song.XMLtitleEndTag)){

                   String titleString = inputLine.substring(Song.XMLtitleStartTag.length()+1, 
                           inputLine.length()- Song.XMLtitleEndTag.length()).trim();

                   if(titleString != null && titleString.length() > 0)
                       theSongs.add(new Song(titleString))              
               }
           } 

I understand there are different ways to parse XML, I was wondering if I should stick to the method I'm using and build off of it, or should I try a different, easier approach.

Also wondering if I could get a pointer with parsing the rest of the album information if possible

The short answer is, yes, you should drop your current approach and seek something else. Many hundreds of developer hours have gone into producing libraries that are capable of parsing XML files in standardised manner.

There are any number of libraries available for parsing XML.

You could start by taking a look at the inbuilt APIs, Java API for XML Processing (JAXP) .

Generally it comes down to two approaches.

SAX or DOM.

SAX is basically inline processing of the XML as it's parsed. This means, that as the XML document is being processed, you are been given the opportunity to process that parsing. This is good for large documents and when you only need linear access to the content.

DOM (or Document Object Model) generates a model of the XML, which you can process at your leisure. It's better suited to smaller XML documents, as the entire model is normally read into memory and when you want to interact with the document in a non-linear fashion, such as searching for example...

The following is a simple snippet of loading a XML document in a DOM...

try {
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    try {
        Document doc = builder.parse(new File("Album.xml"));
    } catch (SAXException | IOException ex) {
        ex.printStackTrace();
    }
} catch (ParserConfigurationException exp) {
    exp.printStackTrace();
}

Once you have the Document , you are ready to process it in any way you see fit. To my mind, it'd take a look at XPath , which is a query API for XML

For example...

import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class SongList {

    public static void main(String[] args) {
        try {
            DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            try {
                Document doc = builder.parse(new File("Album.xml"));

                XPathFactory xPathFactory = XPathFactory.newInstance();
                XPath xPath = xPathFactory.newXPath();

                // Find all album tabs starting at the root level
                XPathExpression xExpress = xPath.compile("/album");
                NodeList nl = (NodeList)xExpress.evaluate(doc.getDocumentElement(), XPathConstants.NODESET);
                for (int index = 0; index < nl.getLength(); index++) {

                    Node albumNode = nl.item(index);
                    // Find the title node that is a child of the albumNode
                    Node titleNode = (Node) xPath.compile("title").evaluate(albumNode, XPathConstants.NODE);
                    System.out.println(titleNode.getTextContent());

                }

                // Find all albums whose title is equal to " Sample Album "
                xExpress = xPath.compile("/album[title=' Sample Album ']");
                nl = (NodeList)xExpress.evaluate(doc.getDocumentElement(), XPathConstants.NODESET);
                for (int index = 0; index < nl.getLength(); index++) {

                    Node albumNode = nl.item(index);
                    Node titleNode = (Node) xPath.compile("title").evaluate(albumNode, XPathConstants.NODE);
                    System.out.println(titleNode.getTextContent());

                }

            } catch (SAXException | IOException | XPathExpressionException ex) {
                ex.printStackTrace();
            }
        } catch (ParserConfigurationException exp) {
            exp.printStackTrace();
        }
    }

}

Perhaps you could try something like:

import java.io.File;
import java.util.LinkedList;
import java.util.List;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class Test {

    public static final class Album {

        public final String title;
        public final String year;
        public final String style;

        public final List<Song> songs;

        Album(final String title, final String year, final String style){
            this.title = title;
            this.year = year;
            this.style = style;

            songs = new LinkedList<>();
        }
    }

    public static final class Song {

        public final Album album;
        public final String name;

        Song(final Album album, final String name){
            this.album = album;
            this.name = name;
        }
    }

    public static List<Album> getAlbums(final File xml) throws Exception {
        final List<Album> albums = new LinkedList<>();
        final Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(xml);
        doc.getDocumentElement().normalize();
        final NodeList list = doc.getElementsByTagName("album");
        for(int i = 0; i < list.getLength(); i++){
            final Node node = list.item(i);
            if(node.getNodeType() != Node.ELEMENT_NODE)
                continue;
            final Element e = (Element) node;
            final NodeList children = e.getChildNodes();
            final Album album = new Album(children.item(0).getNodeValue(), children.item(1).getNodeValue(), children.item(2).getNodeValue());
            final NodeList songs = e.getElementsByTagName("song");
            for(int j = 0; j < songs.getLength(); j++)
                album.songs.add(new Song(album, songs.item(j).getNodeValue()));
            albums.add(album);
        }
        return albums;
    }
}

Parsing XML correctly requires a much more flexible (and complicated) mechanism than the routine you have here. You would do better to make use of an existing parser.

If you really want to write your own, this code is not the foundation of a workable approach. Remember that XML is not line based and there are no requirements for related tags to be contained on the same line. This makes parsing a file line by line a difficult and awkward way to get started, and trying to identify entities by pattern matching one line at a time is simply a broken technique (any entity may always span more than a single line).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM