简体   繁体   English

如何使用Jsoup从html对象获取ID-Java

[英]How to get id from html Objects with Jsoup - Java

I want to find the id of html objects with Jsoup. 我想用Jsoup查找html对象的ID。

<object id="gamediv" </object>

I tried: 我试过了:

String startingURL = "http://www.example.com";
try {
    doc = Jsoup.connect(startingURL)
            .userAgent("Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0")
            .referrer("http://www.google.com")
            .timeout(1000*5) //it's in milliseconds, so this means 5 seconds.              
            .get();
} catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Elements get = doc.select("object");

for (Element elem : get){
    if (get.attr("id") != null){
        System.out.println(get.attr("id"));
    }
}

but nothing happens. 但什么也没发生。 Any help please? 有什么帮助吗?

First of all you can reduce your code to simple. 首先,您可以将代码简化为简单的代码。

for (Element elem : doc.select("object[id]")) {
    System.out.println(elem.attr("id"));
}

Secondly if doc doesn't contain object you are looking for, it means that it wasn't sent to it by server. 其次,如果doc不包含您要查找的object ,则意味着该文件不是由服务器发送给它的。 There may be few reasons where most often ones are 可能出于少数原因,大多数情况是

  • incorrect user agent header, 用户代理标头不正确,
  • this HTML code is generated by browser via JavaScript. 此HTML代码是由浏览器通过JavaScript生成的。

First case doesn't seem to apply here, so in case of dynamic content you should probably use other library since Jsoup is only parser, not browser emulator. 第一种情况似乎并不适用于此,因此,在动态内容的情况下,您可能应该使用其他库,因为Jsoup只是解析器,而不是浏览器模拟器。 If you are looking for more powerful tool take a look a web drivers like Selenium. 如果您正在寻找功能更强大的工具,请查看Selenium之类的网络驱动程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM