简体   繁体   English

使用jSoup从Android中的外部网站接收表数据

[英]Receive table data from external website in Android using jSoup

Inside my Android app I want to receive some table data from an external website. 在我的Android应用程序内部,我想从外部网站接收一些表格数据。

Lets say website page X has this table inside it's HTML: 可以说,网站页面X的HTML内包含此表:

<table summary="Foo" border="0" bgcolor="#ffffff" cellpadding="0"> </table>

How would I receive the strings inside all the cells of the second column of the table (top to bottom)? 如何接收表格第二列(从上到下)的所有单元格内的字符串?

So far what I have done is the following: 到目前为止,我已经做了以下工作:

  1. Create an AsyncTask 创建一个AsyncTask

  2. Use jSoup to scrape the external website. 使用jSoup抓取外部网站。

I used the following code inside my AsyncTask: 我在AsyncTask中使用了以下代码:

ArrayList<String> list = new ArrayList<String>(); //table data
Document document = Jsoup.connect(url).get();
Elements nextTurns = document.select(":contains(Foo) td:eq(1)");            
        for (Element nextTurn : nextTurns) {
            list.add(nextTurn.text());
        }

When running the code it just seems to stop at the document.select statement and the GC is going crazy. 在运行代码时,它似乎只是停在document.select语句上,GC快要疯了。 After a very long time it does get past the document.select statement and it does get most of the data correct but it still has random other elements from the website. 经过很长时间后,它确实超越了document.select语句,并且确实使大多数数据正确,但是它仍然从网站中随机提取其他元素。

I am pretty sure this is completely wrong: 我很确定这是完全错误的:

Elements nextTurns = document.select(":contains(Foo) td:eq(1)"); 

But I am unsure how to fix it because the table also lacks any ID's. 但是我不确定如何解决它,因为该表也缺少任何ID。 And I find this page confusing. 而且我发现此页面令人困惑。

How can I fix the select statement and/or for loop so it fills up the ArrayList with data from the second table column? 如何修复select语句和/或for循环,以便它用第二个表列中的数据填充ArrayList?

Edit: by removing contains(Foo) it's now really fast so that's 1 problem less. 编辑:通过删除contains(Foo)它现在确实非常快,因此少了1个问题。 I still need help with traversing the DOM elements to the second column of the table without taking a bunch of random parts of the website. 在将DOM元素遍历到表的第二列时,我仍然需要帮助,而无需占用网站的大量随机部分。

This is the correct selection, guessing based on your post 这是正确的选择,根据您的帖子进行猜测

document.select("table[summary=Foo] tr");

Loop through the list above, and get the second <td> which is at index 1 of the list. 循环浏览上面的列表,并获得第二个<td> ,它位于列表的索引1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM