[英]Parsing HTML file using Xpath in JAVA
I have a Java code that could read the source of a URL and save to a file(source.html) and now from the saved page, I want to extract some value using XPath.我有一个 Java 代码,它可以读取 URL 的源代码并保存到文件 (source.html),现在从保存的页面中,我想使用 XPath 提取一些值。 Suppose I want to read the price -
//div [@itemprop='price'] //text ()
假设我想读取价格 -
//div [@itemprop='price'] //text ()
How to do this further, Will I be able to do this directly in the saved HTML page or I should first convert this to an XML file and then use XPath.如何进一步执行此操作,我是否可以直接在保存的 HTML 页面中执行此操作,或者我应该首先将其转换为 XML 文件,然后使用 XPath。 I have heard about HTML cleaners/Parsers should I use this here ?
我听说过 HTML 清洁器/解析器,我应该在这里使用它吗? Please do not point to another website for answers.
请不要指向其他网站寻求答案。 If so route me to a spot where I can make a direct and simple lesson.
如果是这样,请将我带到一个可以进行直接和简单课程的地方。 Modifying the below code would be highly helpful.
修改以下代码将非常有帮助。
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import org.jsoup.Jsoup;
public class jSoupContentRead {
@SuppressWarnings("resource")
public static void main(String[] args) throws IOException {
FileWriter FR = new FileWriter("source.html");
PrintWriter op = new PrintWriter(FR);
org.jsoup.nodes.Document doc = Jsoup.connect(
"http://itunes.apple.com/us/book/a-way-home/id982665320?mt=11")
.get();
op.write(doc.toString());
System.out.println(doc.toString());
}
}
Generally (cross languages) XPath is to be applyied to the DOM structure.通常(跨语言)XPath 将应用于 DOM 结构。 In php there is a standart procedure:
在 php 中有一个标准程序:
I think there should be something similar in JAVA.我认为JAVA中应该有类似的东西。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.