简体   繁体   English

在 JAVA 中使用 Xpath 解析 HTML 文件

[英]Parsing HTML file using Xpath in JAVA

I have a Java code that could read the source of a URL and save to a file(source.html) and now from the saved page, I want to extract some value using XPath.我有一个 Java 代码,它可以读取 URL 的源代码并保存到文件 (source.html),现在从保存的页面中,我想使用 XPath 提取一些值。 Suppose I want to read the price - //div [@itemprop='price'] //text ()假设我想读取价格 - //div [@itemprop='price'] //text ()

How to do this further, Will I be able to do this directly in the saved HTML page or I should first convert this to an XML file and then use XPath.如何进一步执行此操作,我是否可以直接在保存的 HTML 页面中执行此操作,或者我应该首先将其转换为 XML 文件,然后使用 XPath。 I have heard about HTML cleaners/Parsers should I use this here ?我听说过 HTML 清洁器/解析器,我应该在这里使用它吗? Please do not point to another website for answers.请不要指向其他网站寻求答案。 If so route me to a spot where I can make a direct and simple lesson.如果是这样,请将我带到一个可以进行直接和简单课程的地方。 Modifying the below code would be highly helpful.修改以下代码将非常有帮助。

import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;

import org.jsoup.Jsoup;

public class jSoupContentRead {
    @SuppressWarnings("resource")
    public static void main(String[] args) throws IOException {
        FileWriter FR = new FileWriter("source.html");
        PrintWriter op = new PrintWriter(FR);

        org.jsoup.nodes.Document doc = Jsoup.connect(
                "http://itunes.apple.com/us/book/a-way-home/id982665320?mt=11")
                .get();

        op.write(doc.toString());
        System.out.println(doc.toString());
    }
}

Generally (cross languages) XPath is to be applyied to the DOM structure.通常(跨语言)XPath 将应用于 DOM 结构。 In php there is a standart procedure:在 php 中有一个标准程序:

  1. Get html获取 html
  2. Make it a valid xml (might be an optional step)使其成为有效的 xml(可能是可选步骤)
  3. Make of it a DOMDocument object instance使其成为 DOMDocument 对象实例
  4. Make of it a DOMXPath object instance使其成为 DOMXPath 对象实例
  5. Apply xpath query to this DOMXPath instance.将 xpath 查询应用于此 DOMXPath 实例。 See an example in php .请参阅php 中示例

I think there should be something similar in JAVA.我认为JAVA中应该有类似的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM