简体   繁体   English

如何在 Java 中使用 xPath 正确解析此 XML 文件?

[英]How can I properly parse this XML file with xPath in Java?

This is my XML file that I need to parse:这是我需要解析的 XML 文件:

<?xml version="1.0"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2002-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
   <book id="bk103">
      <author>Corets, Eva</author>
      <title>Maeve Ascendant</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-11-17</publish_date>
      <description>After the collapse of a nanotechnology 
      society in England, the young survivors lay the 
      foundation for a new society.</description>
   </book>
   <book id="bk104">
      <author>Corets, Eva</author>
      <title>Oberon's Legacy</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-03-10</publish_date>
      <description>In post-apocalypse England, the mysterious 
      agent known only as Oberon helps to create a new life 
      for the inhabitants of London. Sequel to Maeve 
      Ascendant.</description>
   </book>
   <book id="bk105">
      <author>Corets, Eva</author>
      <title>The Sundered Grail</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2001-09-10</publish_date>
      <description>The two daughters of Maeve, half-sisters, 
      battle one another for control of England. Sequel to 
      Oberon's Legacy.</description>
   </book>
   <book id="bk106">
      <author>Randall, Cynthia</author>
      <title>Lover Birds</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2003-09-02</publish_date>
      <description>When Carla meets Paul at an ornithology 
      conference, tempers fly as feathers get ruffled.</description>
   </book>
   <book id="bk107">
      <author>Thurman, Paula</author>
      <title>Splish Splash</title>
      <genre>Romance</genre>
      <price>4.95</price>
      <publish_date>2004-11-02</publish_date>
      <description>A deep sea diver finds true love twenty 
      thousand leagues beneath the sea.</description>
   </book>
   <book id="bk108">
      <author>Knorr, Stefan</author>
      <title>Creepy Crawlies</title>
      <genre>Horror</genre>
      <price>4.95</price>
      <publish_date>2005-12-06</publish_date>
      <description>An anthology of horror stories about roaches,
      centipedes, scorpions  and other insects.</description>
   </book>
   <book id="bk109">
      <author>Kress, Peter</author>
      <title>Paradox Lost</title>
      <genre>Science Fiction</genre>
      <price>6.95</price>
      <publish_date>2006-11-02</publish_date>
      <description>After an inadvertant trip through a Heisenberg
      Uncertainty Device, James Salway discovers the problems 
      of being quantum.</description>
   </book>
   <book id="bk110">
      <author>O'Brien, Tim</author>
      <title>Microsoft .NET: The Programming Bible</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2006-12-09</publish_date>
      <description>Microsoft's .NET initiative is explored in 
      detail in this deep programmer's reference.</description>
   </book>
   <book id="bk111">
      <author>O'Brien, Tim</author>
      <title>MSXML3: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>36.95</price>
      <publish_date>2007-12-01</publish_date>
      <description>The Microsoft MSXML3 parser is covered in 
      detail, with attention to XML DOM interfaces, XSLT processing, 
      SAX and more.</description>
   </book>
   <book id="bk112">
      <author>Galos, Mike</author>
      <title>Visual Studio 7: A Comprehensive Guide</title>
      <genre>Computer</genre>
      <price>49.95</price>
      <publish_date>2008-04-16</publish_date>
      <description>Microsoft Visual Studio 7 is explored in depth,
      looking at how Visual Basic, Visual C++, C#, and ASP+ are 
      integrated into a comprehensive development 
      environment.</description>
   </book>
</catalog>

I want to show every book and its information that has a publish date after 2005. and the price is bigger than 10. This is my Java code:我想显示出版日期在 2005 年之后的每本书及其信息。价格大于 10。这是我的 Java 代码:

package xml;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;

public class Main {

    public static void main(String[] args) throws XPathExpressionException, FileNotFoundException {

        XPathFactory factory = XPathFactory.newInstance();
        XPath xPath = factory.newXPath();

        XPathExpression xPathExpression = xPath.compile("catalog/book[publish_date>2005]/price | catalog/book[price>10]/price");

        File xmlDocument = new File("Books.xml");
        InputSource inputSource = new InputSource(new FileInputStream(xmlDocument));

        Object result = xPathExpression.evaluate(inputSource, XPathConstants.NODESET);

        NodeList nodeList = (NodeList)result;

        for (int i = 0; i < nodeList.getLength(); i++) {
            System.out.println("Info: " + nodeList.item(i).getFirstChild().getNodeValue());
        }

    }

}

How should I do this properly with this query?我应该如何使用此查询正确执行此操作?

Adding Lorem ipsum so the question can post: Lorem Ipsum is simply dummy text of the printing and typesetting industry.添加 Lorem ipsum 以便可以发布问题:Lorem Ipsum 只是印刷和排版行业的虚拟文本。 Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.自 1500 年代以来,Lorem Ipsum 一直是行业的标准虚拟文本,当时一位不知名的印刷商采用了一种类型的厨房并将其加扰以制作一本类型样本书。 It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.它不仅经历了五个世纪,而且经历了电子排版的飞跃,基本保持不变。 It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.它在 1960 年代随着包含 Lorem Ipsum 段落的 Letraset 表的发布而流行起来,最近还随着 Aldus PageMaker 等桌面出版软件(包括 Lorem Ipsum 的版本)而普及。

You almost have it right.你几乎是对的。 The problem is that, according the specification, a numeric comparison ( < or > ) requires implicitly converting each operand to a number.问题在于,根据规范,数字比较( <> )需要将每个操作数隐式转换为数字。 A node's text content is only a valid number if it consists entirely of ASCII digits, with an optional leading minus, optional period, and optional surrounding whitespace.如果节点的文本内容完全由 ASCII 数字组成,并且带有可选的前导减号、可选的句点和可选的周围空格,则它只是一个有效数字。

A date like 2002-12-16 obviously does not qualify.2002-12-16这样的日期显然不符合条件。 However, you can turn that into a string that can be implicitly converted into a number, using substring-before :但是,您可以使用substring-before将其转换为可以隐式转换为数字的字符串:

XPathExpression xPathExpression = xPath.compile(
    "catalog/book[substring-before(publish_date,'-')>2005 and price>10]/price");

Take advantage of the XML date format and do a string comparison there, annd combine your conditions利用 XML 日期格式并在那里进行字符串比较,并结合您的条件

/catalog/book[(publish_date > '2005') and (number(price) > 10)]

And thus因此

XPathExpression xPathExpression = xPath.compile("/catalog/book[(publish_date > '2005') and (number(price) > 10)]");
NodeList bookNodes = (NodeList)xPathExpression.evaluate(inputSource, XPathConstants.NODESET);
for (int i = 0; i < bookNodes.getLength(); i++) {
    Element bookElement = bookNodes.item(i);
    System.out.println("Author: " + bookElement.getElementsByTagName("author").item(0).getNodeValue());
}

You'll need to add the remaining, necessary tags.您需要添加剩余的必要标签。 Also, if you book elements might no all contain all expected node, you'll need to check the collection returned by getElementsByTagName()此外,如果您预订的元素可能并非全部包含所有预期的节点,则需要检查 getElementsByTagName() 返回的集合

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM