[英]Using Java to parse XML
我制作了一個解析XML文件的PHP腳本。 這不容易使用,我想用Java實現它。
在第一個元素內部有不同數量的wfs:member
我遍歷的wfs:member
元素:
foreach ($data->children("wfs", true)->member as $member) { }
這很容易用Java做:
NodeList wfsMember = doc.getElementsByTagName("wfs:member");
for(int i = 0; i < wfsMember.getLength(); i++) { }
我已經打開了這樣的XML文件
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.parse(WeatherDatabaseUpdater.class.getResourceAsStream("wfs.xml"));
然后我需要從一個名為observerdProperty
的元素中獲取一個屬性。 在PHP中這很簡單:
$member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href
但在Java中,我該怎么做? 如果我想深入了解結構,我是否需要使用getElementsByTagName
並循環遍歷它們?
在PHP中,整個腳本看起來如下。
foreach ($data->children("wfs", true)->member as $member) {
$dataType = $dataTypes[(string) $member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->observedProperty->
attributes("xlink", true)->href];
foreach ($member->
children("omso", true)->PointTimeSeriesObservation->
children("om", true)->result->
children("wml2", true)->MeasurementTimeseries->
children("wml2", true)->point as $point) {
$time = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->time;
$value = $point->children("wml2", true)->MeasurementTVP->children("wml2", true)->value;
$data[$dataType][] = array($time, $value)
}
}
在第二個foreach
我遍歷觀察元素並從中獲取時間和值數據。 然后我將它保存在一個數組中。 如果我需要按照我描述的方式遍歷Java中的元素,這很難實現。 我不認為是這種情況,所以有人可以建議我如何在Java中實現類似的東西嗎?
如果性能不是主要問題,最簡單的方法可能是XPath。 使用XPath,您只需指定路徑即可找到節點和屬性。
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
xpath_expression可以很簡單
"string(//member/observedProperty/@href)"
有關XPath的更多信息, W3Schools的XPath教程非常好。
如何在Java上實現XML解析幾乎沒有變化。
最常見的是: DOM,SAX,StAX 。
每個人都有利弊。 使用Dom和Sax,您可以使用xsd架構驗證xml。 但是Stax在沒有xsd驗證的情況下工作,並且速度更快。
例如, xml文件 :
<?xml version="1.0" encoding="UTF-8"?>
<staff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="oldEmployee.xsd">
<employee>
<name>Carl Cracker</name>
<salary>75000</salary>
<hiredate year="1987" month="12" day="15" />
</employee>
<employee>
<name>Harry Hacker</name>
<salary>50000</salary>
<hiredate year="1989" month="10" day="1" />
</employee>
<employee>
<name>Tony Tester</name>
<salary>40000</salary>
<hiredate year="1990" month="3" day="15" />
</employee>
</staff>
實現中最長的(在我看來) DOM解析器:
class DomXmlParser {
private Document document;
List<Employee> empList = new ArrayList<>();
public SchemaFactory schemaFactory;
public final String JAXP_SCHEMA_LANGUAGE = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
public final String W3C_XML_SCHEMA = "http://www.w3.org/2001/XMLSchema";
public DomXmlParser() {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(new File(EMPLOYEE_XML.getFilename()));
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseFromXmlToEmployee() {
NodeList nodeList = document.getDocumentElement().getChildNodes();
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
if (node instanceof Element) {
Employee emp = new Employee();
NodeList childNodes = node.getChildNodes();
for (int j = 0; j < childNodes.getLength(); j++) {
Node cNode = childNodes.item(j);
// identify the child tag of employees
if (cNode instanceof Element) {
switch (cNode.getNodeName()) {
case "name":
emp.setName(text(cNode));
break;
case "salary":
emp.setSalary(Double.parseDouble(text(cNode)));
break;
case "hiredate":
int yearAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("year").getNodeValue());
int monthAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("month").getNodeValue());
int dayAttr = Integer.parseInt(cNode.getAttributes().getNamedItem("day").getNodeValue());
emp.setHireDay(yearAttr, monthAttr - 1, dayAttr);
break;
}
}
}
empList.add(emp);
}
}
return empList;
}
private String text(Node cNode) {
return cNode.getTextContent().trim();
}
}
SAX解析器:
class SaxHandler extends DefaultHandler {
private Stack<String> elementStack = new Stack<>();
private Stack<Object> objectStack = new Stack<>();
public List<Employee> employees = new ArrayList<>();
Employee employee = null;
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
this.elementStack.push(qName);
if ("employee".equals(qName)) {
employee = new Employee();
this.objectStack.push(employee);
this.employees.add(employee);
}
if("hiredate".equals(qName))
{
int yearatt = Integer.parseInt(attributes.getValue("year"));
int monthatt = Integer.parseInt(attributes.getValue("month"));
int dayatt = Integer.parseInt(attributes.getValue("day"));
if (employee != null) {
employee.setHireDay(yearatt, monthatt - 1, dayatt) ;
}
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
this.elementStack.pop();
if ("employee".equals(qName)) {
Object objects = this.objectStack.pop();
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
String value = new String(ch, start, length).trim();
if (value.length() == 0) return; // skip white space
if ("name".equals(currentElement())) {
employee = (Employee) this.objectStack.peek();
employee.setName(value);
} else if ("salary".equals(currentElement()) && "employee".equals(currentParrentElement())) {
employee.setSalary(Double.parseDouble(value));
}
}
private String currentElement() {
return this.elementStack.peek();
}
private String currentParrentElement() {
if (this.elementStack.size() < 2) return null;
return this.elementStack.get(this.elementStack.size() - 2);
}
}
Stax解析器:
class StaxXmlParser {
private List<Employee> employeeList;
private Employee currentEmployee;
private String tagContent;
private String attrContent;
private XMLStreamReader reader;
public StaxXmlParser(String filename) {
employeeList = null;
currentEmployee = null;
tagContent = null;
try {
XMLInputFactory factory = XMLInputFactory.newFactory();
reader = factory.createXMLStreamReader(new FileInputStream(new File(filename)));
parseEmployee();
} catch (Exception e) {
e.printStackTrace();
}
}
public List<Employee> parseEmployee() throws XMLStreamException {
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_ELEMENT:
if ("employee".equals(reader.getLocalName())) {
currentEmployee = new Employee();
}
if ("staff".equals(reader.getLocalName())) {
employeeList = new ArrayList<>();
}
if ("hiredate".equals(reader.getLocalName())) {
int yearAttr = Integer.parseInt(reader.getAttributeValue(null, "year"));
int monthAttr = Integer.parseInt(reader.getAttributeValue(null, "month"));
int dayAttr = Integer.parseInt(reader.getAttributeValue(null, "day"));
currentEmployee.setHireDay(yearAttr, monthAttr - 1, dayAttr);
}
break;
case XMLStreamConstants.CHARACTERS:
tagContent = reader.getText().trim();
break;
case XMLStreamConstants.ATTRIBUTE:
int count = reader.getAttributeCount();
for (int i = 0; i < count; i++) {
System.out.printf("count is: %d%n", count);
}
break;
case XMLStreamConstants.END_ELEMENT:
switch (reader.getLocalName()) {
case "employee":
employeeList.add(currentEmployee);
break;
case "name":
currentEmployee.setName(tagContent);
break;
case "salary":
currentEmployee.setSalary(Double.parseDouble(tagContent));
break;
}
}
}
return employeeList;
}
}
還有一些main()測試:
public static void main(String[] args) {
long startTime, elapsedTime;
Main main = new Main();
startTime = System.currentTimeMillis();
main.testSaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testStaxParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
startTime = System.currentTimeMillis();
main.testDomParser(); // test
elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime / 1000));
}
輸出:
Using SAX Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 106 ms
Using StAX Parser:
------------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 5 ms
Using DOM Parser:
-----------------
Employee { name=Carl Cracker, salary=75000.0, hireDay=Tue Dec 15 00:00:00 EET 1987 }
Employee { name=Harry Hacker, salary=50000.0, hireDay=Sun Oct 01 00:00:00 EET 1989 }
Employee { name=Tony Tester, salary=40000.0, hireDay=Thu Mar 15 00:00:00 EET 1990 }
Parsing time is: 13 ms
你可以看到一些變化的一瞥。
但是在java中存在其他作為JAXB - 你需要有xsd
模式並且符合這個模式你生成類。 在此之后你可以使用unmarchal()
從xml
文件中讀取:
public class JaxbDemo {
public static void main(String[] args) {
try {
long startTime = System.currentTimeMillis();
// create jaxb and instantiate marshaller
JAXBContext context = JAXBContext.newInstance(Staff.class.getPackage().getName());
FileInputStream in = new FileInputStream(new File(Files.EMPLOYEE_XML.getFilename()));
System.out.println("Output from employee XML file");
Unmarshaller um = context.createUnmarshaller();
Staff staff = (Staff) um.unmarshal(in);
// print employee list
for (Staff.Employee emp : staff.getEmployee()) {
System.out.println(emp);
}
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println(String.format("Parsing time is: %d ms%n", elapsedTime));
} catch (Exception e) {
e.printStackTrace();
}
}
}
我像以前一樣試過這個方法,結果是下一個:
Employee { name='Carl Cracker', salary=75000, hiredate=1987-12-15 } }
Employee { name='Harry Hacker', salary=50000, hiredate=1989-10-1 } }
Employee { name='Tony Tester', salary=40000, hiredate=1990-3-15 } }
Parsing time is: 320 ms
我添加了另一個toString()
,它有不同的雇用日格式。
以下是一些有趣的鏈接:
正如您已經指出的那樣,使用DOM
解析器,您可以輕松地陷入亂七八糟的嵌套for
循環。 然而, DOM
結構由包含NodeList
形式的Node
節點集合表示,其中每個元素又是一個Node
- 這成為遞歸的理想候選者。
為了展示DOM
解析器折扣XML大小的能力,我舉了一個托管樣本OpenWeatherMap XML的例子。
此XML
包含倫敦每3小時的天氣預報。 這種XML
可以很好地通過相對較大的數據集進行讀取,並通過子元素中的屬性提取特定信息。
在快照中,我們的目標是收集箭頭標記的Elements
。
我們首先創建一個Custom類來保存溫度和雲值。 我們還會覆蓋此自定義類的toString()
以方便地打印我們的記錄。
ForeCast.java
public class ForeCast {
/**
* Overridden toString() to conveniently print the results
*/
@Override
public String toString() {
return "The minimum temperature is: " + getTemperature()
+ " and the weather overall: " + getClouds();
}
public String getTemperature() {
return temperature;
}
public void setTemperature(String temperature) {
this.temperature = temperature;
}
public String getClouds() {
return clouds;
}
public void setClouds(String clouds) {
this.clouds = clouds;
}
private String temperature;
private String clouds;
}
現在到主班。 在我們執行遞歸的主類中,我們想要創建一個ForeCast
對象List
,它通過遍歷整個XML來存儲單獨的溫度和雲記錄。
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
在XML中,溫度和雲元素的父元素都是時間 ,我們會在邏輯上檢查時間元素。
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
之后,我們將獲得可以設置為ForeCast
對象的溫度和雲元素的ForeCast
。
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName("temperature").item(0);
// Minimum temperature value is selectively picked (for proof of concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName("clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
完整的課程如下:
CustomDomXmlParser.java
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class CustomDomXmlParser {
// List collection which is would hold all the data parsed through the XML
// in the format defined by the custom type 'ForeCast'
private static List<ForeCast> forecastList = new ArrayList<>();
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
// Read XML throuhg a URL (a FileInputStream can be used to pick up an
// XML file from the file system)
InputStream path = new URL(
"http://api.openweathermap.org/data/2.5/forecast?q=London,us&mode=xml")
.openStream();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(path);
// Call to the recursive method with the parent node
traverse(document.getDocumentElement());
// Print the List values collected within the recursive method
for (ForeCast forecastObj : forecastList)
System.out.println(forecastObj);
}
/**
*
* @param node
*/
public static void traverse(Node node) {
// Get the list of Child Nodes immediate to the current node
NodeList list = node.getChildNodes();
// Declare our local instance of forecast object
ForeCast forecastObj = null;
/**
* Logical block
*/
// As per the XML syntax our 2 fields temperature and clouds come
// directly under the Node/Element time
if (node.getNodeName().equals("time")
&& node.getNodeType() == Node.ELEMENT_NODE) {
// Instantiate our custom forecast object
forecastObj = new ForeCast();
Element timeElement = (Element) node;
// Get the temperature element by its tag name within the XML (0th
// index known)
Element tempElement = (Element) timeElement.getElementsByTagName(
"temperature").item(0);
// Minimum temperature value is selectively picked (for proof of
// concept)
forecastObj.setTemperature(tempElement.getAttribute("min"));
// Similarly get the clouds element
Element cloudElement = (Element) timeElement.getElementsByTagName(
"clouds").item(0);
forecastObj.setClouds(cloudElement.getAttribute("value"));
}
// Add our foreCastObj if initialized within this recursion, that is if
// it traverses the time node within the XML, and not in any other case
if (forecastObj != null)
forecastList.add(forecastObj);
/**
* Recursion block
*/
// Iterate over the next child nodes
for (int i = 0; i < list.getLength(); i++) {
Node currentNode = list.item(i);
// Recursively invoke the method for the current node
traverse(currentNode);
}
}
}
您可以從下面的屏幕截圖中了解到,我們能夠將2個特定元素組合在一起,並將其值有效地分配給Java Collection
實例。 我們將xml
的復雜解析委托給通用遞歸解決方案,並主要定制logical block
部分。 如上所述,它是一種具有最小定制的遺傳解決方案,可以通過所有有效的xmls
。
還有許多其他替代方案,這里是Java的開源XML解析器列表。
但是,您使用PHP的方法和基於Java的解析器的初始工作與基於DOM的XML解析器解決方案一致,通過使用遞歸進行了簡化。
Java API雖然可以為您提供所需的一切,但您可以看到它們非常荒謬。 您可以查看Xsylum以獲得更直接的內容:
(猜測你的XML結構如何):
List<XmlElement> elements = Xsylum.elementFor(xmlFile).getAll("wfs:member");
for (XmlElement e : elements)
String dataType = e.get("omso").get("om").attribute("xlink");
正如其他地方所建議的那樣,您也可能只想使用XPath來提取您所追求的內容,這對於Xsylum來說也很簡單:
List<String> values = Xsylum.documentFor(xmlFile).values("//omso/om/@href");
我不建議您為XML解析實現自己的解析函數,因為那里已經有很多選項。 我的建議是DOM解析器。 您可以在以下鏈接中找到一些示例。 (您也可以選擇其他可用選項)
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html
您可以使用諸如的命令
eElement.getAttribute("id");
資料來源: http : //www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/
我同意已經發布的關於不自己實現解析函數的內容。
而不是DOM / SAX / STAX解析器,我建議使用JDOM或XOM,它們是外部庫。
相關討論:
我的直覺是jdom是大多數java開發人員使用的。 有些人使用dom4j,有些是xom,有些是其他人,但幾乎沒有人使用這些解析函數。
對DOM解析器使用Java startElement和endElement
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.