[英]Parsing xml using Java Dom Parser
我是Java和XML的新手,我需要从xml文件中获取一些数据。
这是我的xml
<?xml version="1.0" encoding="UTF-8"?>
<course name="BSc (Hons) Software Engineering" version="5.0" type="FT" lowerbound="2012" upperbound="2014" >
<year id="1">
<semester id="1">
<module>
<code>HCA1105C</code>
<name>Computer Architecture</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>PROG1115C</code>
<name>Object Oriented Software Development I</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MATH1103C</code>
<name>Decision Mathematics</name>
<credits>3</credits>
<hrs_per_wk>2+1</hrs_per_wk>
</module>
<module>
<code>ITE1107C</code>
<name>Language and Communication Seminar</name>
<credits>3</credits>
<hrs_per_wk>2+1</hrs_per_wk>
</module>
<module>
<code>MGMT1101C</code>
<name>Management Seminar</name>
<credits>3</credits>
<hrs_per_wk>2+1</hrs_per_wk>
</module>
</semester>
<semester id="2">
<module>
<code>PROG1116C</code>
<name>Object Oriented Software Development II</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>WAT1116C</code>
<name>Internet Programming I</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MATH1101C</code>
<name>Analytic Methods for Computing</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>DBT1111C</code>
<name>Database Design</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
</semester>
</year>
<year id="2">
<semester id="1">
<module>
<code>CAN2112C</code>
<name>Network Design & Programming</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>WAT2117C</code>
<name>Internet Programming II</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>OSS2109C</code>
<name>Operating Systems</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>PROG2117C</code>
<name>Desktop Application Development</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
</semester>
<semester id="2">
<module>
<code>SDT2114C</code>
<name>Requirements Engineering</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MATH2323C</code>
<name>Numerical Methods</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MCT2104C</code>
<name>Mobile Application Development</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MCT2104C</code>
<name>Mobile Application Development</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>WAT2124C</code>
<name>Web Services</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MGMT2104C</code>
<name>Research & Development Seminar</name>
<credits>3</credits>
<hrs_per_wk>2+1</hrs_per_wk>
</module>
</semester>
</year>
<year id="3">
<semester id="1">
<module>
<code>SECU3119C</code>
<name>Secure Software Development</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MULT3114C</code>
<name>Game Development</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>SEM3112C</code>
<name>Project Management Seminar</name>
<credits>3</credits>
<hrs_per_wk>2+1</hrs_per_wk>
</module>
</semester>
<semester id="2">
<module>
<code>SDT3104C</code>
<name>Enterprise Software Development</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>WAT3125C</code>
<name>Emerging Web Technologies</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>SEM3113C</code>
<name>Software Quality Management</name>
<credits>4</credits>
<hrs_per_wk>2+2</hrs_per_wk>
</module>
<module>
<code>MGMT3105C</code>
<name>Entrepreneurship Seminar</name>
<credits>3</credits>
<hrs_per_wk>2+1</hrs_per_wk>
</module>
<module>
<code>PROJ3105C</code>
<name>Systems Development Project</name>
<credits>9</credits>
<hrs_per_wk />
</module>
</semester>
</year>
</course>
可以说,我想获取第一学期第一年的所有模块代码。
HCA1105C
PROG1115C
MATH1103C
ITE1107C
MGMT1101C
到目前为止,这是我的代码
try {
File inputFile = new File(System.getProperty("user.dir") + "/courses/bse.xml");
DocumentBuilderFactory dbFactory
= DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("year");
for (int i = 0; i < nList.getLength(); i++) {
Node nNode = nList.item(i);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
//if (Integer.parseInt(eElement.getAttribute("id")) == 1 ) {
System.out.println(eElement.getElementsByTagName("code").item(0).getTextContent());
//}
}
}
} catch (Exception e) {
JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
System.exit(1);
}
我得到以下输出
HCA1105C
CAN2112C
SECU3119C
您的代码正在阅读每年的第一个模块。 这是因为,对于您指定的条件,节点列表将具有3个节点(year = 1,year = 2,year = 3)。
如果要阅读第1年的所有模块,则需要使用year =“ 1”递归到文档的小节。 然后,您将获得学期的节点列表。 您需要进一步递归到学期= 1的孩子。
您可以尝试将查询与xpath结合使用,从中可以直接获取year = 1和semester = 1的模块。
http://viralpatel.net/blogs/java-xml-xpath-tutorial-parse-xml/
使用XPath使用修改后的代码进行编辑 :
try {
File inputFile = new File("courses.xml");
DocumentBuilderFactory dbFactory
= DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
doc.getDocumentElement().normalize();
XPath xPath = XPathFactory.newInstance().newXPath();
String expression = "/course/year[@id=1]/semester[@id=1]/module/code";
NodeList nodeList = (NodeList) xPath.compile(expression).evaluate(doc, XPathConstants.NODESET);
System.out.println(expression);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getTextContent());
}
} catch (Exception e) {
JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
System.exit(1);
}
检查子节点并深入研究模块将给出以下预期结果;
public static void main(String[] args) {
try {
File inputFile = new File("Snippet.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("year");
for (int i = 0; i < nList.getLength(); i++) {
Node nNode = nList.item(i);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
if (Integer.parseInt(eElement.getAttribute("id")) == 1) { // Found year 1
NodeList semeterList = nNode.getChildNodes();
for (int j = 0; j < semeterList.getLength(); j++) {
nNode = semeterList.item(j);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element semesterNode = (Element) nNode;
if (Integer.parseInt(semesterNode.getAttribute("id")) == 1) { //Found semester 1
NodeList moduleList = semesterNode.getChildNodes();
for (int k = 0; k < moduleList.getLength(); k++) {
nNode = moduleList.item(k);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element modeluNode = (Element) nNode;
System.out.println(modeluNode.getElementsByTagName("code").item(0).getTextContent());
}
}
}
}
}
}
}
}
} catch (Exception e) {
e.printStackTrace();
JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
System.exit(1);
}
}
我们可以使用以下代码通过DOM获取所有代码:
try {
File inputFile = new File("src/resources/res.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(inputFile);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("module");
for (int i = 0; i < nList.getLength(); i++) {
Node nNode = nList.item(i);
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println(eElement.getElementsByTagName("code").item(0).getTextContent());
}
}
} catch (Exception e) {
JOptionPane.showMessageDialog(null, e.getMessage(), "Fatal Error", JOptionPane.ERROR_MESSAGE);
System.exit(1);
}
我们也可以通过每年循环->学期->模块来获取代码,然后获取属性代码。 上面的代码给出以下结果:
HCA1105C PROG1115C MATH1103C ITE1107C MGMT1101C PROG1116C WAT1116C MATH1101C DBT1111C CAN2112C WAT2117C OSS2109C PROG2117C SDT2114C MATH2323C MCT2104C MCT2104C WAT2124C MGMT2104C SEM2C104 CMMTC104C
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.