在 JAVA 中使用 Xpath 解析 HTML 文件

Question

I have a Java code that could read the source of a URL and save to a file(source.html) and now from the saved page, I want to extract some value using XPath.我有一个 Java 代码，它可以读取 URL 的源代码并保存到文件 (source.html)，现在从保存的页面中，我想使用 XPath 提取一些值。 Suppose I want to read the price - //div [@itemprop='price'] //text ()假设我想读取价格 - //div [@itemprop='price'] //text ()

How to do this further, Will I be able to do this directly in the saved HTML page or I should first convert this to an XML file and then use XPath.如何进一步执行此操作，我是否可以直接在保存的 HTML 页面中执行此操作，或者我应该首先将其转换为 XML 文件，然后使用 XPath。 I have heard about HTML cleaners/Parsers should I use this here ?我听说过 HTML 清洁器/解析器，我应该在这里使用它吗？ Please do not point to another website for answers.请不要指向其他网站寻求答案。 If so route me to a spot where I can make a direct and simple lesson.如果是这样，请将我带到一个可以进行直接和简单课程的地方。 Modifying the below code would be highly helpful.修改以下代码将非常有帮助。

import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;

import org.jsoup.Jsoup;

public class jSoupContentRead {
    @SuppressWarnings("resource")
    public static void main(String[] args) throws IOException {
        FileWriter FR = new FileWriter("source.html");
        PrintWriter op = new PrintWriter(FR);

        org.jsoup.nodes.Document doc = Jsoup.connect(
                "http://itunes.apple.com/us/book/a-way-home/id982665320?mt=11")
                .get();

        op.write(doc.toString());
        System.out.println(doc.toString());
    }
}

Answer 1

Generally (cross languages) XPath is to be applyied to the DOM structure.通常（跨语言）XPath 将应用于 DOM 结构。 In php there is a standart procedure:在 php 中有一个标准程序：

Get html获取 html
Make it a valid xml (might be an optional step)使其成为有效的 xml（可能是可选步骤）
Make of it a DOMDocument object instance使其成为 DOMDocument 对象实例
Make of it a DOMXPath object instance使其成为 DOMXPath 对象实例
Apply xpath query to this DOMXPath instance.将 xpath 查询应用于此 DOMXPath 实例。 See an example in php .请参阅php 中的示例。

I think there should be something similar in JAVA.我认为JAVA中应该有类似的东西。

在 JAVA 中使用 Xpath 解析 HTML 文件

问题描述

1 个解决方案

解决方案1
0 已采纳 2015-05-18 09:34:43

在 JAVA 中使用 Xpath 解析 HTML 文件

问题描述

1 个解决方案

解决方案1 0 已采纳 2015-05-18 09:34:43

解决方案1
0 已采纳 2015-05-18 09:34:43