简体   繁体   中英

HTML parse in Java?

I'm trying to parse (with jsoup) some specific text from a website, but it doesn't work for me. LINK TO SITE

It's the number "43" in red text that I am interested in at the top-right of the page.

This is what I tried:

String test;

public void scan(String url) throws Exception {

    Document document = Jsoup.connect(url).get();        
    Elements votes = document.select("#malicious-votes .pull-right");
    test = votes.text();
}

public int returnVotes(){
    return test();
}

~ ~ ~

public static void main(String[] args) throws Exception {

    Scan_VirusTotal virustotal = new Scan_VirusTotal();     
    virustotal.scan("https://www.virustotal.com/sv/url/cbf2d00f974d212b6700e7051f8b23f2038e876173066af41780e09481ef1cdd/analysis/1407146081");      
    System.out.println(virustotal.returnVotes());

This prints nothing. Other elements work fine with this exact method, so I'm really confused as to why this particular piece of text won't parse.

Ideas? Thanks.

EDIT - added some HTML from page as requested:

<div style="display:block" class="pull-right value text-red" id="malicious-votes">44</div>

Try using this instead:

Elements votes = document.select("#malicious-votes");
test = votes.text();

I tried this $("#malicious-votes .pull-right") in the browser console of the given page, gives me empty array. But $("#malicious-votes") gives me the vote div which itself has the class pull-right .

Your selector should be:

"#malicious-votes" , not "#malicious-votes .pull-right" .

"#malicious-votes .pull-right" selects any elements with class pull-right that are descendants of #malicious-votes . What you want is the #malicious-votes element itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM