简体   繁体   中英

How to get id from html Objects with Jsoup - Java

I want to find the id of html objects with Jsoup.

<object id="gamediv" </object>

I tried:

String startingURL = "http://www.example.com";
try {
    doc = Jsoup.connect(startingURL)
            .userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
            .referrer("http://www.google.com")
            .timeout(1000*5) //it's in milliseconds, so this means 5 seconds.              
            .get();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Elements get = doc.select("object");

for (Element elem : get){
    if (get.attr("id") != null){
        System.out.println(get.attr("id"));
    }
}

but nothing happens. Any help please?

First of all you can reduce your code to simple.

for (Element elem : doc.select("object[id]")) {
    System.out.println(elem.attr("id"));
}

Secondly if doc doesn't contain object you are looking for, it means that it wasn't sent to it by server. There may be few reasons where most often ones are

  • incorrect user agent header,
  • this HTML code is generated by browser via JavaScript.

First case doesn't seem to apply here, so in case of dynamic content you should probably use other library since Jsoup is only parser, not browser emulator. If you are looking for more powerful tool take a look a web drivers like Selenium.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM