简体   繁体   中英

How to find all anchors matching a word with Jsoup?

Thank you in advance for your time. The code is supposed to connect to the website, and scrape the OS model from the line that has a word that is inputted by the user. It will search for the word, go to that line, and scrape the OS attribute on that line for that word. I don't see as to why my code is not working, and would appreciate some help please.

Here is the website http://www.tabletpccomparison.net/

Here is the code:

import java.io.IOException;
import java.util.Iterator;
import java.util.Scanner;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class ExtraPart1 {
public static void main(String args[]) throws IOException{
    Scanner input = new Scanner(System.in);
    String word = "";
    System.out.println("Type in what you are trying to search for.");
    word = input.nextLine();
    System.out.println("This program will find a quality from a website for it");
    String URL = "http://www.tabletpccomparison.net/";
    Document doc = Jsoup.connect(URL).get();
    Elements elements = doc.select("a");
    for(Element e : elements){
        if(e.equals(word)){
            String next_word = e.getElementsByClass("tableJX2ope_sis").text();
            System.out.print(next_word);
        }
    }

}
}

The problem lies here:

if(e.equals(word)){
        String next_word = e.getElementsByClass("tableJX2ope_sis").text();
        System.out.print(next_word);
}

e is an Element and it is compared to a String . Try this instead:

if(e.text().equals(word)) {
   // ...
}

You may simplify the for loop like this:

String cssQuery = String.format("a:containsOwn(%s)", word);
Elements elements = doc.select(cssQuery);

for(Element e : elements){
    String nextWord = e.getElementsByClass("tableJX2ope_sis").text();
    System.out.print(nextWord);
}

References

Your CSS selector should target the links directly in the table you are trying to scrape. By selecting on only a you will have to iterate every link on the document.

    String selector = String.format(
         "table.tableJX tr:contains(%s) > td.tableJX2ope_sis > span.field", word);

    for (Element os : doc.select(selector))
        System.out.println(os.ownText());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM