简体   繁体   English

使用jSoup从表中获取数据

[英]Get data from table using jSoup

I am looking to get data from the table on http://www.sportinglife.com/greyhounds/abc-guide using jSoup. 我希望使用jSoup从http://www.sportinglife.com/greyhounds/abc-guide上的表中获取数据。 I would like to put this data into some kind of table within my java program that I can then use in my code. 我想将此数据放入Java程序中的某种表中,然后可以在代码中使用。

I'm not too sure how to do this. 我不太确定该怎么做。 I have been playing around with jSoup and currently am able to get each cell from the table to print out using a while loop - but obviously can't use this always as the number of cells in the table will change. 我一直在玩jSoup,目前可以使用while循环从表中获取每个单元格以进行打印-但显然不能始终使用此表,因为表中单元格的数量会发生变化。

    Document doc = Jsoup.connect("http://www.sportinglife.com/greyhounds/abc-guide").get();
    int n = 0;
    while (n < 100){
    Element tableHeader = doc.select("td").get(n);


    for( Element element : tableHeader.children() )
    {
        // Here you can do something with each element
        System.out.println(element.text());
    }
    n++;
    }

Any idea of how I could do this? 关于我该怎么做的任何想法?

There are just a few things you have to implement to achieve your goal. 要实现目标,您只需执行几件事。 Take a look on this Groovy script - https://gist.github.com/wololock/568b9cc402ea661de546 Now lets explain what we have here 看看这个Groovy脚本-https: //gist.github.com/wololock/568b9cc402ea661de546现在让我们解释一下这里的内容

List<Element> rows = document.select('table[id=ABC Guide] > tbody > tr')

Here we're specifying that we are interested in every row tr that is immediate child of tbody which is immediate child of table with id ABC Guide . 在这里,我们指定对每个行tr感兴趣,该行trtbody直接子代, tbody是ID为ABC Guidetable直接子代。 In return you receive a list of Element objects that describes those tr rows. 作为回报,您将收到描述这些tr行的Element对象的列表。

Map<String, String> data = new HashMap<>()

We will store our result in a simple hash map for further evaluation eg putting those scraped data into the database. 我们将结果存储在一个简单的哈希图中,以进行进一步评估,例如将这些抓取的数据放入数据库中。

for (Element row : rows) {
    String dog = row.select('td:eq(0)').text()
    String race = row.select('td:eq(1)').text()

    data.put(dog, race)
}

Now we iterate over every Element and we select content as a text from the first cell: String dog = row.select('td:eq(0)').text() and we repeat this step to retrieve the content as a text from the second cell: String race = row.select('td:eq(1)').text() . 现在,我们遍历每个Element然后从第一个单元格中选择内容作为文本: String dog = row.select('td:eq(0)').text()然后重复此步骤以文本形式检索内容从第二个单元格开始: String race = row.select('td:eq(1)').text() Then we just simply put those data into the hash map. 然后,我们只需将这些数据放入哈希映射即可。 That's all. 就这样。

I hope this example with provided description will help you with developing your application. 希望本示例提供的描述将对您开发应用程序有所帮​​助。

EDIT: 编辑:

Java code sample - https://gist.github.com/wololock/8ccbc6bbec56ef57fc9e Java代码示例-https: //gist.github.com/wololock/8ccbc6bbec56ef57fc9e

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM