簡體   English   中英

使用Java來解析XML

[英]Using Java to parse XML

我制作了一個解析XML文件的PHP腳本。 這不容易使用,我想用Java實現它。

在第一個元素內部有不同數量的wfs:member我遍歷的wfs:member元素:

foreach ($data->children("wfs", true)->member as $member) { }

這很容易用Java做:

NodeList wfsMember = doc.getElementsByTagName("wfs:member");
for(int i = 0; i < wfsMember.getLength(); i++) { }

我已經打開了這樣的XML文件

DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(WeatherDatabaseUpdater.class.getResourceAsStream("wfs.xml"));

然后我需要從一個名為observerdProperty的元素中獲取一個屬性。 在PHP中這很簡單:

$member->
    children("omso", true)->PointTimeSeriesObservation->
    children("om", true)->observedProperty->
    attributes("xlink", true)->href

但在Java中,我該怎么做? 如果我想深入了解結構,我是否需要使用getElementsByTagName並循環遍歷它們?

在PHP中,整個腳本看起來如下。

foreach ($data->children("wfs", true)->member as $member) {
    $dataType = $dataTypes[(string) $member->
                    children("omso", true)->PointTimeSeriesObservation->
                    children("om", true)->observedProperty->
                    attributes("xlink", true)->href];

    foreach ($member->
            children("omso", true)->PointTimeSeriesObservation->
            children("om", true)->result->
            children("wml2", true)->MeasurementTimeseries->
            children("wml2", true)->point as $point) {

        $time = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->time;
        $value = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->value;

        $data[$dataType][] = array($time, $value)
    }
}

在第二個foreach我遍歷觀察元素並從中獲取時間和值數據。 然后我將它保存在一個數組中。 如果我需要按照我描述的方式遍歷Java中的元素,這很難實現。 我不認為是這種情況,所以有人可以建議我如何在Java中實現類似的東西嗎?

如果性能不是主要問題,最簡單的方法可能是XPath。 使用XPath,您只需指定路徑即可找到節點和屬性。

XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

xpath_expression可以很簡​​單

"string(//member/observedProperty/@href)"

有關XPath的更多信息, W3Schools的XPath教程非常好。

如何在Java上實現XML解析幾乎沒有變化。

最常見的是: DOM,SAX,StAX

每個人都有利弊。 使用Dom和Sax,您可以使用xsd架構驗證xml。 但是Stax在沒有xsd驗證的情況下工作,並且速度更快。

例如, xml文件

<?xml version="1.0" encoding="UTF-8"?>
<staff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:noNamespaceSchemaLocation="oldEmployee.xsd">
    <employee>
        <name>Carl Cracker</name>
        <salary>75000</salary>
        <hiredate year="1987" month="12" day="15" />
    </employee>
    <employee>
        <name>Harry Hacker</name>
        <salary>50000</salary>
        <hiredate year="1989" month="10" day="1" />
    </employee>
    <employee>
        <name>Tony Tester</name>
        <salary>40000</salary>
        <hiredate year="1990" month="3" day="15" />
    </employee>
</staff>

實現中最長的(在我看來) DOM解析器:

class DomXmlParser {    
    private Document document;
    List<Employee> empList = new ArrayList<>();

    public SchemaFactory schemaFactory;
    public final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
    public final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";    

    public DomXmlParser() {  
        try {
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            factory.setNamespaceAware(true);
            factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
            DocumentBuilder builder = factory.newDocumentBuilder();
            document = builder.parse(new File(EMPLOYEE_XML.getFilename()));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }    

    public List<Employee> parseFromXmlToEmployee() {
        NodeList nodeList = document.getDocumentElement().getChildNodes();
        for (int i = 0; i < nodeList.getLength(); i++) {
            Node node = nodeList.item(i);

            if (node instanceof Element) {
                Employee emp = new Employee();

                NodeList childNodes = node.getChildNodes();
                for (int j = 0; j < childNodes.getLength(); j++) {
                    Node cNode = childNodes.item(j);

                    // identify the child tag of employees
                    if (cNode instanceof Element) {
                        switch (cNode.getNodeName()) {
                            case "name":
                                emp.setName(text(cNode));
                                break;
                            case "salary":
                                emp.setSalary(Double.parseDouble(text(cNode)));
                                break;
                            case "hiredate":
                                int yearAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("year").getNodeValue());
                                int monthAttr =  Integer.parseInt(cNode.getAttributes().getNamedItem("month").getNodeValue());
                                int dayAttr =  Integer.parseInt(cNode.getAttributes().getNamedItem("day").getNodeValue());

                                emp.setHireDay(yearAttr, monthAttr - 1, dayAttr);
                                break;
                        }
                    }
                }
                empList.add(emp);
            }
        }
        return empList;
    }
    private String text(Node cNode) {
        return cNode.getTextContent().trim();
    }
}

SAX解析器:

class SaxHandler extends DefaultHandler {

    private Stack<String> elementStack = new Stack<>();
    private Stack<Object> objectStack = new Stack<>();

    public List<Employee> employees = new ArrayList<>();
    Employee employee = null;

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        this.elementStack.push(qName);

        if ("employee".equals(qName)) {
            employee = new Employee();
            this.objectStack.push(employee);
            this.employees.add(employee);
        }
        if("hiredate".equals(qName))
        {
            int yearatt = Integer.parseInt(attributes.getValue("year"));
            int monthatt = Integer.parseInt(attributes.getValue("month"));
            int dayatt = Integer.parseInt(attributes.getValue("day"));

            if (employee != null) {
                employee.setHireDay(yearatt,  monthatt - 1,  dayatt) ;
            }
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        this.elementStack.pop();

        if ("employee".equals(qName)) {
            Object objects = this.objectStack.pop();
        }
    }

    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        String value = new String(ch, start, length).trim();
        if (value.length() == 0) return;        // skip white space

        if ("name".equals(currentElement())) {
            employee = (Employee) this.objectStack.peek();
            employee.setName(value);
        } else if ("salary".equals(currentElement()) && "employee".equals(currentParrentElement())) {
            employee.setSalary(Double.parseDouble(value));
        }
    }

    private String currentElement() {
        return this.elementStack.peek();
    }

    private String currentParrentElement() {
        if (this.elementStack.size() < 2) return null;
        return this.elementStack.get(this.elementStack.size() - 2);
    }
}

Stax解析器:

class StaxXmlParser {
    private List<Employee> employeeList;
    private Employee currentEmployee;
    private String tagContent;
    private String attrContent;
    private XMLStreamReader reader;
    public StaxXmlParser(String filename) {
        employeeList = null;
        currentEmployee = null;
        tagContent = null;

        try {
            XMLInputFactory factory = XMLInputFactory.newFactory();
            reader = factory.createXMLStreamReader(new FileInputStream(new File(filename)));
            parseEmployee();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public List<Employee> parseEmployee() throws XMLStreamException {
        while (reader.hasNext()) {
            int event = reader.next();
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                    if ("employee".equals(reader.getLocalName())) {
                        currentEmployee = new Employee();
                    }
                    if ("staff".equals(reader.getLocalName())) {
                        employeeList = new ArrayList<>();
                    }
                    if ("hiredate".equals(reader.getLocalName())) {
                        int yearAttr = Integer.parseInt(reader.getAttributeValue(null, "year"));
                        int monthAttr = Integer.parseInt(reader.getAttributeValue(null, "month"));
                        int dayAttr = Integer.parseInt(reader.getAttributeValue(null, "day"));

                        currentEmployee.setHireDay(yearAttr, monthAttr - 1, dayAttr);
                    }
                    break;

                case XMLStreamConstants.CHARACTERS:
                    tagContent = reader.getText().trim();
                    break;

                case XMLStreamConstants.ATTRIBUTE:
                    int count = reader.getAttributeCount();
                    for (int i = 0; i < count; i++) {
                        System.out.printf("count is: %d%n", count);
                    }
                    break;

                case XMLStreamConstants.END_ELEMENT:
                    switch (reader.getLocalName()) {
                        case "employee":
                            employeeList.add(currentEmployee);
                            break;
                        case "name":
                            currentEmployee.setName(tagContent);
                            break;
                        case "salary":
                            currentEmployee.setSalary(Double.parseDouble(tagContent));
                            break;
                    }
            }
        }
        return employeeList;
    }    
}

還有一些main()測試:

 public static void main(String[] args) {
    long startTime, elapsedTime;
    Main main = new Main();

    startTime = System.currentTimeMillis();
    main.testSaxParser();   // test
    elapsedTime = System.currentTimeMillis() - startTime;
    System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));

    startTime = System.currentTimeMillis();
    main.testStaxParser();  // test
    elapsedTime = System.currentTimeMillis() - startTime;
    System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));

    startTime = System.currentTimeMillis();
    main.testDomParser();  // test
    elapsedTime = System.currentTimeMillis() - startTime;
    System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
}

輸出:

Using SAX Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 106 ms

Using StAX Parser:
------------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 5 ms

Using DOM Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 13 ms

你可以看到一些變化的一瞥。

但是在java中存在其他作為JAXB - 你需要有xsd模式並且符合這個模式你生成類。 在此之后你可以使用unmarchal()xml文件中讀取:

public class JaxbDemo {
    public static void main(String[] args) {
        try {
            long startTime = System.currentTimeMillis();
            // create jaxb and instantiate marshaller
            JAXBContext context = JAXBContext.newInstance(Staff.class.getPackage().getName());
            FileInputStream in = new FileInputStream(new File(Files.EMPLOYEE_XML.getFilename()));

            System.out.println("Output from employee XML file");
            Unmarshaller um = context.createUnmarshaller();
            Staff staff = (Staff) um.unmarshal(in);

            // print employee list
            for (Staff.Employee emp : staff.getEmployee()) {
                System.out.println(emp);
            }

            long elapsedTime = System.currentTimeMillis() - startTime;
            System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

我像以前一樣試過這個方法,結果是下一個:

Employee { name='Carl Cracker', salary=75000, hiredate=1987-12-15 } }
Employee { name='Harry Hacker', salary=50000, hiredate=1989-10-1 } }
Employee { name='Tony Tester', salary=40000, hiredate=1990-3-15 } }
Parsing time is: 320 ms

我添加了另一個toString() ,它有不同的雇用日格式。

以下是一些有趣的鏈接:

DOM解析器通過遞歸

正如您已經指出的那樣,使用DOM解析器,您可以輕松地陷入亂七八糟的嵌套for循環。 然而, DOM結構由包含NodeList形式的Node節點集合表示,其中每個元素又是一個Node - 這成為遞歸的理想候選者。

示例XML

為了展示DOM解析器折扣XML大小的能力,我舉了一個托管樣本OpenWeatherMap XML的例子。

按城市名稱搜索XML格式

XML包含倫敦每3小時的天氣預報。 這種XML可以很好地通過相對較大的數據集進行讀取,並通過子元素中的屬性提取特定信息。

在此輸入圖像描述

在快照中,我們的目標是收集箭頭標記的Elements

編碼

我們首先創建一個Custom類來保存溫度和雲值。 我們還會覆蓋此自定義類的toString()以方便地打印我們的記錄。

ForeCast.java

public class ForeCast {

    /**
     * Overridden toString() to conveniently print the results
     */
    @Override
    public String toString() {
        return "The minimum temperature is: " + getTemperature()
                + " and the weather overall: " + getClouds();
    }

    public String getTemperature() {
        return temperature;
    }

    public void setTemperature(String temperature) {
        this.temperature = temperature;
    }

    public String getClouds() {
        return clouds;
    }

    public void setClouds(String clouds) {
        this.clouds = clouds;
    }

    private String temperature;
    private String clouds;
}

現在到主班。 在我們執行遞歸的主類中,我們想要創建一個ForeCast對象List ,它通過遍歷整個XML來存儲單獨的溫度和雲記錄。

// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();

在XML中,溫度和雲元素的父元素都是時間 ,我們會在邏輯上檢查時間元素。

/**
 * Logical block
 */
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
        && node.getNodeType() == Node.ELEMENT_NODE) {
    // Instantiate our custom forecast object
    forecastObj = new ForeCast();
    Element timeElement = (Element) node;

之后,我們將獲得可以設置為ForeCast對象的溫度和雲元素的ForeCast

    // Get the temperature element by its tag name within the XML (0th
    // index known)
    Element tempElement = (Element) timeElement.getElementsByTagName("temperature").item(0);
    // Minimum temperature value is selectively picked (for proof of concept)
    forecastObj.setTemperature(tempElement.getAttribute("min"));

    // Similarly get the clouds element
    Element cloudElement = (Element) timeElement.getElementsByTagName("clouds").item(0);
    forecastObj.setClouds(cloudElement.getAttribute("value"));

完整的課程如下:

CustomDomXmlParser.java

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class CustomDomXmlParser {

    // List collection which is would hold all the data parsed through the XML
    // in the format defined by the custom type 'ForeCast'
    private static List<ForeCast> forecastList = new ArrayList<>();

    public static void main(String[] args) throws ParserConfigurationException,
            SAXException, IOException {
        // Read XML throuhg a URL (a FileInputStream can be used to pick up an
        // XML file from the file system)
        InputStream path = new URL(
                "http://api.openweathermap.org/data/2.5/forecast?q=London,us&mode=xml")
                .openStream();

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(path);

        // Call to the recursive method with the parent node
        traverse(document.getDocumentElement());

        // Print the List values collected within the recursive method
        for (ForeCast forecastObj : forecastList)
            System.out.println(forecastObj);

    }

    /**
     * 
     * @param node
     */
    public static void traverse(Node node) {
        // Get the list of Child Nodes immediate to the current node
        NodeList list = node.getChildNodes();

        // Declare our local instance of forecast object
        ForeCast forecastObj = null;

        /**
         * Logical block
         */
        // As per the XML syntax our 2 fields temperature and clouds come
        // directly under the Node/Element time
        if (node.getNodeName().equals("time")
                && node.getNodeType() == Node.ELEMENT_NODE) {

            // Instantiate our custom forecast object
            forecastObj = new ForeCast();
            Element timeElement = (Element) node;

            // Get the temperature element by its tag name within the XML (0th
            // index known)
            Element tempElement = (Element) timeElement.getElementsByTagName(
                    "temperature").item(0);
            // Minimum temperature value is selectively picked (for proof of
            // concept)
            forecastObj.setTemperature(tempElement.getAttribute("min"));

            // Similarly get the clouds element
            Element cloudElement = (Element) timeElement.getElementsByTagName(
                    "clouds").item(0);
            forecastObj.setClouds(cloudElement.getAttribute("value"));
        }

        // Add our foreCastObj if initialized within this recursion, that is if
        // it traverses the time node within the XML, and not in any other case
        if (forecastObj != null)
            forecastList.add(forecastObj);

        /**
         * Recursion block
         */
        // Iterate over the next child nodes
        for (int i = 0; i < list.getLength(); i++) {
            Node currentNode = list.item(i);
            // Recursively invoke the method for the current node
            traverse(currentNode);

        }

    }
}

輸出

您可以從下面的屏幕截圖中了解到,我們能夠將2個特定元素組合在一起,並將其值有效地分配給Java Collection實例。 我們將xml的復雜解析委托給通用遞歸解決方案,並主要定制logical block部分。 如上所述,它是一種具有最小定制的遺傳解決方案,可以通過所有有效的xmls

在此輸入圖像描述

備擇方案

還有許多其他替代方案,這里是Java的開源XML解析器列表。

但是,您使用PHP的方法和基於Java的解析器的初始工作與基於DOM的XML解析器解決方案一致,通過使用遞歸進行了簡化。

Java API雖然可以為您提供所需的一切,但您可以看到它們非常荒謬。 您可以查看Xsylum以獲得更直接的內容:

(猜測你的XML結構如何):

List<XmlElement> elements = Xsylum.elementFor(xmlFile).getAll("wfs:member");
for (XmlElement e : elements)
  String dataType = e.get("omso").get("om").attribute("xlink");

正如其他地方所建議的那樣,您也可能只想使用XPath來提取您所追求的內容,這對於Xsylum來說也很簡單:

List<String> values = Xsylum.documentFor(xmlFile).values("//omso/om/@href");

我不建議您為XML解析實現自己的解析函數,因為那里已經有很多選項。 我的建議是DOM解析器。 您可以在以下鏈接中找到一些示例。 (您也可以選擇其他可用選項)

http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html

您可以使用諸如的命令

eElement.getAttribute("id");

資料來源: http//www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/

我同意已經發布的關於不自己實現解析函數的內容。

而不是DOM / SAX / STAX解析器,我建議使用JDOM或XOM,它們是外部庫。

相關討論:

我的直覺是jdom是大多數java開發人員使用的。 有些人使用dom4j,有些是xom,有些是其他人,但幾乎沒有人使用這些解析函數。

對DOM解析器使用Java startElement和endElement

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM