简体   繁体   English

Java:如何使用Jsoup从html提取特定数据?

[英]Java: how can I use Jsoup to extract a particular data from html?

Basically, I am trying to extract the current price of a stock from this link 基本上,我试图从此链接中提取股票的当前价格

By looking at the page source, I want to be able to extract the number from this: 通过查看页面源,我希望能够从中提取数字:

<meta itemprop="price"
        content="31.40" />

This is my Java code. 这是我的Java代码。

public double getCurrentPrice() throws IOException{
        String url = "https://www.google.com.hk/finance?q=0023&ei=yF14VYC4F4Wd0ASb64CoCw";
        Document doc = Jsoup.connect(url).get();
        Element content = doc.getElementById("meta");
}

And I kept getting this error: 而且我一直收到这个错误:

456.0Exception in thread "main" java.lang.Error: Unresolved compilation problem: 
    Element cannot be resolved to a type

    at application.Trade.getCurrentPrice(Trade.java:45)
    at application.Trade.main(Trade.java:64)

The error message is not very helpful. 该错误消息不是很有帮助。 How should I overcome this ? 我应该如何克服呢?

import correct classes. 导入正确的类。 also meta is not a id but a tag .so you can't use getElementById to get that element.using itemprop attribute get this element and get value by content attribute . meta也不是一个id而是一个标签。因此您不能使用getElementById来获取该元素。使用itemprop属性获取此元素并通过content属性获取值。

wildcard only imports classes from the package.for example 通配符仅从包中导入类。例如

import org.jsoup.* will import org.jsoup.nodes but not org.jsoup.nodes.Element; import org.jsoup.*将导入org.jsoup.nodes但不导入org.jsoup.nodes.Element; because org.jsoup.nodes.Element lies in org.jsoup.nodes package. 因为org.jsoup.nodes.Element位于org.jsoup.nodes包中。

example. 例。

import java.io.IOException;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
public class example {

    public static void main(String[] args) throws IOException {
        String url = "https://www.google.com.hk/finance?q=0023&ei=yF14VYC4F4Wd0ASb64CoCw";
        Document doc = Jsoup.connect(url).get();
        Element content = doc.select("meta[itemprop=price]").first();
        System.out.println(content.attr("content"));
    }
}

output 输出

31.40

edit 编辑

to know which classes you should import ..... 知道应该导入哪些类.....

consider this statement 考虑这句话

Document doc 

now you are creating Document object so you should import Document class .if you read jsoup api you can see this class hierarchy . 现在您正在创建Document对象,因此您应该导入Document类。如果您阅读jsoup api ,则可以看到该类层次结构。

as you can see Document is a class of package org.jsoup.nodes so you import class as import org.jsoup.nodes.Document; 如您所见, Documentorg.jsoup.nodes包的一类,因此您将类import org.jsoup.nodes.Document;import org.jsoup.nodes.Document; .you have to read the api. 。您必须阅读api。 anyway ides like netbeans,eclipse suggest you some classes to import that's easy and save time a lot. 无论如何,像netbeans这样的想法,eclipse会建议您一些易于导入的类,并且可以节省大量时间。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM