简体   繁体   中英

Jsoup Java scraping tickersymbol

I understand that scraping the title uses this code scrapes the title "Google Inc (GOOG)" http://finance.yahoo.com/q?s=goog :

    String name = doc.select(".title h2").first().text();

I was wondering how to scrape the title and ticker-symbol separately "Google Inc" and "GOOG":

雅虎金融股票代号

(1) I have to Scrape Solution :

This is a short answer which doesn't include lines of exception handling, however, it is short and work out of box.

public static void main(String[] args) throws IOException {
            // collect the html and create the doc
    String url = "http://finance.yahoo.com/q?s=goog";
    Document doc = Jsoup.connect(url).get();

            // locate the header, title and then found the h2 tag
    Element header = doc.select("div[id=yfi_rt_quote_summary]").get(0);
    Element title = header.select("div[class=title]").get(0);
    String h2 = title.select("h2").get(0).text();

            // split by open parenthesis (double escape) and strip off the close parenthesis
            // TODO - regular expression help handle situation where exist multiple "()"s
    String[] parts = h2.split("\\(");
    String name = parts[0];
    String shortname = parts[1].replace(")", "");
    System.out.println(name);
    System.out.println(shortname);

}

Output looks like this:

Google Inc. 
GOOG

(2) I don't have to Scrape Solution:

Here is really a nice post showing you how to download yahoo data programmatically.

I am also a R user and it is extremely easy to get Yahoo finance data inside R. You can do the analysis there and save that to file or database if you want. :)

You want to scrape the id's: "yfs_184_goog", yfs_c63_goog" and "yfs_p43_goog".

Those are the big black numbers, the little red/green numbers next to it and the percentage.

"Screen scrape" with Jsoup with element who has ID

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM