简体   繁体   English

Java:使用嵌套节点解析XML时遇到麻烦

[英]Java: Having trouble parsing XML with nested nodes

I have an XML file with something like this 我有一个类似这样的XML文件

<album>
    <title> Sample Album </title>
    <year> 2014 </year>
    <musicalStyle> Waltz </musicalStyle>
        <song> Track 1 </song>
        <song> Track 2 </song>
        <song> Track 3 </song>
        <song> Track 4 </song>
        <song> Track 5 </song>
        <song> Track 6 </song>
        <song> Track 7 </song>
</album>

I was able to parse the song by following a tutorial but now I'm stuck with the nested nodes. 我可以按照一个教程来解析歌曲,但现在我受困于嵌套节点。 Song.XMLtitleStartTag = <title> and the end tag being </title> Song.XMLtitleStartTag = <title> ,结束标记为</title>

public static SongList parseFromFile(File inputFile){
    System.out.println("Parse File Data:");     
    if(inputFile == null) return null;      
    SongList theSongs  = new SongList();        
    BufferedReader inputFileReader;

    String inputLine; //current input line
    try{
           inputFileReader= new BufferedReader(new FileReader(inputFile));

           while((inputLine = inputFileReader.readLine()) != null){
               if(inputLine.trim().startsWith(Song.XMLtitleStartTag) && 
                   inputLine.endsWith(Song.XMLtitleEndTag)){

                   String titleString = inputLine.substring(Song.XMLtitleStartTag.length()+1, 
                           inputLine.length()- Song.XMLtitleEndTag.length()).trim();

                   if(titleString != null && titleString.length() > 0)
                       theSongs.add(new Song(titleString))              
               }
           } 

I understand there are different ways to parse XML, I was wondering if I should stick to the method I'm using and build off of it, or should I try a different, easier approach. 我知道解析XML有不同的方法,我想知道是应该坚持使用我所使用的方法并以此为基础,还是应该尝试一种更简单的方法。

Also wondering if I could get a pointer with parsing the rest of the album information if possible 还想知道是否可以通过解析专辑信息的其余部分获得一个指针

The short answer is, yes, you should drop your current approach and seek something else. 简短的答案是,是的,您应该放弃当前的方法并寻求其他方法。 Many hundreds of developer hours have gone into producing libraries that are capable of parsing XML files in standardised manner. 开发人员已经花费了数百个小时来制作能够以标准化方式解析XML文件的库。

There are any number of libraries available for parsing XML. 有许多库可用于解析XML。

You could start by taking a look at the inbuilt APIs, Java API for XML Processing (JAXP) . 您可以先看看内置的API,即用于XML处理的Java API(JAXP)

Generally it comes down to two approaches. 通常,它可以归结为两种方法。

SAX or DOM. SAX或DOM。

SAX is basically inline processing of the XML as it's parsed. SAX基本上是解析后的XML的内联处理。 This means, that as the XML document is being processed, you are been given the opportunity to process that parsing. 这意味着,在处理XML文档时,将有机会处理该解析。 This is good for large documents and when you only need linear access to the content. 这对于大型文档以及仅需要线性访问内容的情况很有用。

DOM (or Document Object Model) generates a model of the XML, which you can process at your leisure. DOM(或文档对象模型)生成XML的模型,您可以随意处理该模型。 It's better suited to smaller XML documents, as the entire model is normally read into memory and when you want to interact with the document in a non-linear fashion, such as searching for example... 它更适合于较小的XML文档,因为通常会将整个模型读入内存中,并且当您想以非线性方式与文档进行交互时(例如搜索...)。

The following is a simple snippet of loading a XML document in a DOM... 以下是在DOM中加载XML文档的简单片段...

try {
    DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    try {
        Document doc = builder.parse(new File("Album.xml"));
    } catch (SAXException | IOException ex) {
        ex.printStackTrace();
    }
} catch (ParserConfigurationException exp) {
    exp.printStackTrace();
}

Once you have the Document , you are ready to process it in any way you see fit. 拥有Document ,您就可以按照自己认为合适的任何方式对其进行处理。 To my mind, it'd take a look at XPath , which is a query API for XML 在我看来,我们来看看XPath ,它是XML的查询API

For example... 例如...

import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class SongList {

    public static void main(String[] args) {
        try {
            DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            try {
                Document doc = builder.parse(new File("Album.xml"));

                XPathFactory xPathFactory = XPathFactory.newInstance();
                XPath xPath = xPathFactory.newXPath();

                // Find all album tabs starting at the root level
                XPathExpression xExpress = xPath.compile("/album");
                NodeList nl = (NodeList)xExpress.evaluate(doc.getDocumentElement(), XPathConstants.NODESET);
                for (int index = 0; index < nl.getLength(); index++) {

                    Node albumNode = nl.item(index);
                    // Find the title node that is a child of the albumNode
                    Node titleNode = (Node) xPath.compile("title").evaluate(albumNode, XPathConstants.NODE);
                    System.out.println(titleNode.getTextContent());

                }

                // Find all albums whose title is equal to " Sample Album "
                xExpress = xPath.compile("/album[title=' Sample Album ']");
                nl = (NodeList)xExpress.evaluate(doc.getDocumentElement(), XPathConstants.NODESET);
                for (int index = 0; index < nl.getLength(); index++) {

                    Node albumNode = nl.item(index);
                    Node titleNode = (Node) xPath.compile("title").evaluate(albumNode, XPathConstants.NODE);
                    System.out.println(titleNode.getTextContent());

                }

            } catch (SAXException | IOException | XPathExpressionException ex) {
                ex.printStackTrace();
            }
        } catch (ParserConfigurationException exp) {
            exp.printStackTrace();
        }
    }

}

Perhaps you could try something like: 也许您可以尝试类似的方法:

import java.io.File;
import java.util.LinkedList;
import java.util.List;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

public class Test {

    public static final class Album {

        public final String title;
        public final String year;
        public final String style;

        public final List<Song> songs;

        Album(final String title, final String year, final String style){
            this.title = title;
            this.year = year;
            this.style = style;

            songs = new LinkedList<>();
        }
    }

    public static final class Song {

        public final Album album;
        public final String name;

        Song(final Album album, final String name){
            this.album = album;
            this.name = name;
        }
    }

    public static List<Album> getAlbums(final File xml) throws Exception {
        final List<Album> albums = new LinkedList<>();
        final Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(xml);
        doc.getDocumentElement().normalize();
        final NodeList list = doc.getElementsByTagName("album");
        for(int i = 0; i < list.getLength(); i++){
            final Node node = list.item(i);
            if(node.getNodeType() != Node.ELEMENT_NODE)
                continue;
            final Element e = (Element) node;
            final NodeList children = e.getChildNodes();
            final Album album = new Album(children.item(0).getNodeValue(), children.item(1).getNodeValue(), children.item(2).getNodeValue());
            final NodeList songs = e.getElementsByTagName("song");
            for(int j = 0; j < songs.getLength(); j++)
                album.songs.add(new Song(album, songs.item(j).getNodeValue()));
            albums.add(album);
        }
        return albums;
    }
}

Parsing XML correctly requires a much more flexible (and complicated) mechanism than the routine you have here. 正确解析XML需要比您在此处使用的例程更加灵活(复杂)的机制。 You would do better to make use of an existing parser. 您最好使用现有的解析器。

If you really want to write your own, this code is not the foundation of a workable approach. 如果您真的想编写自己的代码,则此代码不是可行方法的基础。 Remember that XML is not line based and there are no requirements for related tags to be contained on the same line. 请记住,XML不是基于行的,并且没有要求在同一行上包含相关标签。 This makes parsing a file line by line a difficult and awkward way to get started, and trying to identify entities by pattern matching one line at a time is simply a broken technique (any entity may always span more than a single line). 这使得逐行解析文件成为一种困难且尴尬的入门方式,并且尝试通过一次模式匹配一​​行来识别实体只是一种破烂的技术(任何实体可能总是跨越一行而已)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM