简体   繁体   English

如何使用 Jsoup [Android] 从网页中检索特定表格

[英]How to retrieve a specific table from webpage using Jsoup [ Android ]

I am trying to retrieve a table from this URL .我想从检索表这个URL。 This is the table I need to retrieve :这是我需要检索的表:

 <table id="h2hSum" class="competitionRanking tablesorter"> 
              <thead> 
               <tr> 
                <th align="center">Team</th> 
                <th align="center">Played</th> 
                <th align="center">Win</th> 
                <th align="center">Draw</th> 
                <th align="center">Lose</th> 
                <th align="center">Score</th> 
                <th>Goals Scored</th> 
                <th>Goals Allowed</th> 
               </tr> 
              </thead> 
              <tbody> 
               <tr> 
                <td><a class="teamLink" href="/soccer-statistics/England/Premier-League-2016-2017/team_info_overall/676_Manchester_City_FC">Manchester City</a></td> 
                <td>140</td> 
                <td>47</td> 
                <td>38</td> 
                <td>55</td> 
                <td>188:205</td> 
                <td>1.34</td> 
                <td>1.46</td> 
               </tr> 
               <tr class="odd"> 
                <td><a class="teamLink" href="/soccer-statistics/England/Premier-League-2016-2017/team_info_overall/661_Chelsea_FC">Chelsea</a></td> 
                <td>140</td> 
                <td>55</td> 
                <td>38</td> 
                <td>47</td> 
                <td>205:188</td> 
                <td>1.46</td> 
                <td>1.34</td> 
               </tr> 
              </tbody> 
             </table>

This is what I tried :这是我试过的:

private class SimpleTask1 extends AsyncTask<String, String, String>
{
    ProgressDialog loader;


    @Override
    protected void onPreExecute()
    {
        loader = new ProgressDialog(MainActivity.this, ProgressDialog.STYLE_SPINNER);
        loader.setMessage("loading engine");
        loader.show();

    }

    protected String doInBackground(String... urls)
    {
        String result1 = "";
        try {

            Document doc = Jsoup.connect(urls[0]).get();
            Element table = doc.select("table[class=competitionRanking tablesorter]").first();
            Iterator<Element> ite = table.select("td").iterator();

            ite.next();
            Log.w("Value 1: ",""+ ite.next().text());
            Log.w("Value 2: ",""+ ite.next().text());
            Log.w("Value 3: ",""+ ite.next().text());
            Log.w("Value 4: ",""+ ite.next().text());

        } catch (IOException e) {

        }
        return result1;
    }

    protected void onPostExecute(String sampleVal)
    {
        loader.dismiss();
        Log.e("OUTPUT",""+sampleVal);



    }




}

However, this throws Exception, I tried similar answers, but the answers differ as the tables are accessed using their class name or td width.但是,这会引发异常,我尝试了类似的答案,但由于使用类名或 td 宽度访问表,因此答案有所不同。 What should I do so that I can access all the values in this table?我应该怎么做才能访问此表中的所有值? Kindly help.请帮忙。

Problem问题

Iterator<Element> ite = table.select("td").iterator(); throws a NullPointerException抛出一个NullPointerException

Reason原因

After the initial visit to the site they seem to store your ip and request registration on second visit if your activity was similar to a bot.在首次访问该网站后,如果您的活动类似于机器人,他们似乎会存储您的 IP 并在第二次访问时请求注册。 The landing page you are being redirected to doesn't contain the table, so table is null and you can't call select(...) on null .您被重定向到的登录页面不包含表,因此tablenull ,您不能在null上调用select(...)

Solution解决方案

Register for the service and insert the login procedure to your code or use proxies to switch ip address if you are redirected to the registration page.注册该服务并将登录程序插入到您的代码中,或者如果您被重定向到注册页面,则使用代理来切换 IP 地址。 Not sure how long an ip gets blocked, but using vpn and the following code I had no problems doing 20 consecutive queries.不知道 ip 被阻塞了多长时间,但使用 vpn 和以下代码,我连续执行 20 次查询都没有问题。 So make sure to set a user-agent, cookies and other header fields that are contained in the original site request (eg monitor with developer tools/network tools in browser):因此,请确保设置包含在原始站点请求中的用户代理、cookie 和其他标头字段(例如,在浏览器中使用开发人员工具/网络工具进行监控):

Code代码

String userAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36";
Response res = Jsoup
        .connect("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/")
        .followRedirects(true).userAgent(userAgent).referrer("http://www.soccerpunter.com")
        .method(Method.GET).header("Host", "http://www.soccerpunter.com").execute();

Document doc = Jsoup
        .connect("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/head_to_head_statistics/all/676_Manchester_City_FC/661_Chelsea_FC")
        .userAgent(userAgent).timeout(10000).header("Host", "http://www.soccerpunter.com")
        .cookies(res.cookies())
        .referrer("http://www.soccerpunter.com/soccer-statistics/England/Premier-League-2016-2017/")
        .get();

Elements td = doc.select("table.competitionRanking.tablesorter").first().select("td");

Try this:尝试这个:

Document document = Jsoup.parse(s);
        Element table =  document.select("table[class=competitionRanking tablesorter]").first();
        for (Element element:table.select("tr")){
            for (Element td:element.select("td")){
                System.out.println(td.text());
            }
        }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM