[英]Parsing Highly Nested XML without DOM in Java
我的任務是解決一個令人煩惱的“內存不足”問題。 IBM提供了與Java一起使用的Cognos SDK,我們查詢存儲在內容存儲庫中的所有軟件包,這些軟件包以xml格式返回。 然后,我們解析該xml並將其寫入sql數據庫。 分析顯示,最糟糕的內存問題是由Char []引起的,這不是很有用(並且堆太大,很難進行剖析),但確實指向DOM解析器。
我們正在談論的是500-1500個xml文件(從技術上來說,是XML文本流),它們深深地嵌套在其中,並且大小和結構有時不盡相同。 大小從幾KB到30 MB不等,在大約300個程序包之后,程序將占用8 GB以上的內存。 我之前的程序員通過在每次xml解析后進行一次手動System.gc調用來解決了這一問題,我希望擺脫這一問題(它實際上並不能解決問題,只是使其在最小的500包服務器上可行)。
我嘗試使用JAXB,但是它的結構很奇怪,因此在這里很難使用(發生了一些“文件夾或querySubject”問題)。 上周,我嘗試了STAX幾個小時,但無法完全正常工作,WoodStox也是如此。 我實際上都找不到執行此操作的示例或教程。 JDOM是我接下來要檢查的內容(因為我已經讀過它比純DOM具有更好的內存處理能力),但是我不知道如何使它像DOM一樣深入地解析。 當前的DOM解析:
is = new ByteArrayInputStream(xml.getBytes("UTF-8"));
xmlDoc = builder.parse(is);
is.close();
String _path, datatype, regularAggregate, description, formula;
String table, tableLoc;
NodeList elements = xmlDoc.getElementsByTagName("*");
for (int j = 0; j < elements.getLength(); j++) {
Element element = (Element) elements.item(j);
String nodeName = element.getNodeName();
if (nodeName=="queryItem" || nodeName=="measure"||
nodeName=="calculation" || nodeName=="filter") {
if (element.hasAttribute("_path")) {
path = element.getAttribute("_path"));
}
對每個屬性依此類推
我的JDOM嘗試。 目前,它只打印根元素,而我還不能深入到第一個子層:
SAXBuilder saxBuilder = new SAXBuilder();
Document document = saxBuilder.build(inputFile);
System.out.println("Root element :" + document.getRootElement().getName());
Element root = document.getRootElement();
List<Element> rList = root.getChildren("folder");
if (rList!= null) {
for (Element node : rList) {
List<Element> elements = node.getChildren("queryItem");
if (elements!=null) {
for (Element a:elements) {
System.out.println(a.getAttribute("_path"));
}
elements.size();
rList.removeAll(elements);
}
}
生成的隨機包的xsd結構:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="ResponseRoot">
<xs:complexType>
<xs:sequence>
<xs:element ref="folder"/>
<xs:element ref="package"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="package">
<xs:complexType>
<xs:attribute name="description" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="folder">
<xs:complexType>
<xs:sequence>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="folder"/>
<xs:element ref="querySubject"/>
</xs:choice>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="filter"/>
</xs:sequence>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="isNamespace" use="required" type="xs:integer"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="querySubject">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="queryItem"/>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="queryItemFolder"/>
</xs:sequence>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="filter">
<xs:complexType>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="expression" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="queryItem">
<xs:complexType>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="currency" use="required"/>
<xs:attribute name="datatype" use="required" type="xs:NCName"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="displayType" use="required" type="xs:NCName"/>
<xs:attribute name="expression" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="promptCascadeOnRef" use="required"/>
<xs:attribute name="promptDisplayItemRef" use="required"/>
<xs:attribute name="promptFilterItemRef" use="required"/>
<xs:attribute name="promptType" use="required" type="xs:NCName"/>
<xs:attribute name="regularAggregate" use="required" type="xs:NCName"/>
<xs:attribute name="screenTip" use="required"/>
<xs:attribute name="unSortable" use="required" type="xs:integer"/>
<xs:attribute name="usage" use="required" type="xs:NCName"/>
</xs:complexType>
</xs:element>
xs:element name="queryItemFolder">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element ref="queryItem"/>
<xs:element ref="queryItemFolder"/>
</xs:choice>
<xs:attribute name="_path" use="required"/>
<xs:attribute name="_ref" use="required"/>
<xs:attribute name="description" use="required"/>
<xs:attribute name="name" use="required"/>
<xs:attribute name="screenTip" use="required"/>
</xs:complexType>
</xs:element>
</xs:schema>
對於嵌套結構,如果為每種元素類型創建一個方法,則最容易管理。
例
public static void main(String[] args) throws Exception {
String xml = "<root>" +
"<folder name=\"A\">" +
"<folder name=\"B\">" +
"<book name=\"Learn Java\">" +
"<chapter name=\"Hello, World!\"/>" +
"<chapter name=\"Variables and Types\"/>" +
"</book>" +
"</folder>" +
"</folder>" +
"</root>";
XMLInputFactory factory = XMLInputFactory.newFactory();
XMLStreamReader reader = factory.createXMLStreamReader(new StringReader(xml));
try {
reader.nextTag(); // Position on root element
String tagName = reader.getLocalName();
if (! tagName.equals("root"))
throw new XMLStreamException("Expected <root> element, found: " + tagName, reader.getLocation());
parseRoot(reader);
} finally {
reader.close();
}
}
private static void parseRoot(XMLStreamReader reader) throws XMLStreamException {
while (reader.nextTag() != XMLStreamConstants.END_ELEMENT) {
String tagName = reader.getLocalName();
if (tagName.equals("folder")) {
parseFolder(reader, Collections.emptyList());
} else {
throw new XMLStreamException("Expected <folder> element, found: " + tagName, reader.getLocation());
}
}
}
private static void parseFolder(XMLStreamReader reader, List<String> parentPaths) throws XMLStreamException {
String folderName = reader.getAttributeValue(null, "name");
if (folderName == null)
throw new XMLStreamException("Missing 'name' attribute on <folder> element", reader.getLocation());
List<String> folderPath = new ArrayList<>(parentPaths.size() + 1);
folderPath.addAll(parentPaths);
folderPath.add(folderName);
while (reader.nextTag() != XMLStreamConstants.END_ELEMENT) {
String tagName = reader.getLocalName();
if (tagName.equals("folder")) {
parseFolder(reader, folderPath);
} else if (tagName.equals("book")) {
parseBook(reader, folderPath);
} else {
throw new XMLStreamException("Expected <folder> or <book> element, found: " + tagName, reader.getLocation());
}
}
}
private static void parseBook(XMLStreamReader reader, List<String> folderPath) throws XMLStreamException {
String bookName = reader.getAttributeValue(null, "name");
if (bookName == null)
throw new XMLStreamException("Missing 'name' attribute on <book> element", reader.getLocation());
while (reader.nextTag() != XMLStreamConstants.END_ELEMENT) {
String tagName = reader.getLocalName();
if (tagName.equals("chapter")) {
parseChapter(reader, folderPath, bookName);
} else {
throw new XMLStreamException("Expected <chapter> element, found: " + tagName, reader.getLocation());
}
}
}
private static void parseChapter(XMLStreamReader reader, List<String> folderPath, String bookName) throws XMLStreamException {
String chapterName = reader.getAttributeValue(null, "name");
if (chapterName == null)
throw new XMLStreamException("Missing 'name' attribute on <chapter> element", reader.getLocation());
if (! reader.getElementText().isEmpty())
throw new XMLStreamException("<chapter> element must be empty", reader.getLocation());
System.out.println("Found:");
System.out.println(" Folder: " + folderPath);
System.out.println(" Book: " + bookName);
System.out.println(" Chapter: " + chapterName);
}
產量
Found:
Folder: [A, B]
Book: Learn Java
Chapter: Hello, World!
Found:
Folder: [A, B]
Book: Learn Java
Chapter: Variables and Types
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.