简体   繁体   English

Jsoup并从表中获取数据

[英]Jsoup and getting data from a table

I have tried similar code on the BBC and Wikipedia websites and I can access and handle data from the tables. 我已经在BBC和Wikipedia网站上尝试了类似的代码,并且可以访问和处理表格中的数据。 What is my mistake when I try to do the same on this website? 当我尝试在此网站上进行相同操作时,我会犯什么错误? I can get some data with this code but not the specific figures inside the table when I change it to .doc.getElementsByClass("tabela2"); 当我将其更改为.doc.getElementsByClass("tabela2");时,可以使用此代码获取一些数据,但不能获取表中的特定图形.doc.getElementsByClass("tabela2"); .

    String url = "https://www.itau.com.br/investimentos-previdencia/fundos/rentabilidade-uniclass";
    Document doc = null;
    try {
        doc = Jsoup.connect(url)
        .userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36")
        .get();
    } catch (IOException e) {
        e.printStackTrace();
    }
    Elements table = doc.getElementsByClass("contentAllInt");
    System.out.println("table: " + table);

Thanks 谢谢

The website in question uses AJAX to load its content dynamically after initial loading. 有问题的网站在初始加载后使用AJAX动态加载其内容。 So it uses JavaScript to render the content. 因此,它使用JavaScript呈现内容。

JSoup does not handle Javascript. JSoup不处理Javascript。 It is not a browser after all. 毕竟它不是浏览器。 So JSoup can't be used here. 因此,不能在这里使用JSoup。

2 solutions: 2个解决方案:

1) Use a real brwoser, eg selenium to access this content in Java. 1)使用真正的浏览器(例如硒)以Java访问此内容。 Drawback: You add a new dependency. 缺点:您添加了一个新的依赖项。 Real browsers are resource hungry. 真正的浏览器非常耗资源。 It may run slowly. 它可能运行缓慢。

2) Investigate the AJAX calls directly and try to mimic those calls and interpret the results. 2)直接调查AJAX调用,并尝试模仿这些调用并解释结果。 Usually this is not in HTML format but in JSON, so you may need another interpreting library. 通常,它不是HTML格式,而是JSON,因此您可能需要另一个解释库。 I did not look into this option in your case, but usually this is the faster option. 在您的情况下,我没有研究此选项,但是通常这是更快的选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM