简体   繁体   English

Android Jsoup,如何解析表?

[英]Android Jsoup, how to parse table?

I'm trying to grab a table from a webpage, but i can't seem to get it working properly. 我正在尝试从网页中获取表格,但似乎无法正常工作。

<table cellpadding="0" cellspacing="0" border="0" class="pricetable sortable" id="sortabletable">
    <thead class="tableheader">
        <tr class="sortbottom">
            <th class="thtableheaderlogo unsortable">&nbsp;</th>
            <th class="thtableheaderprice"><div class="tableheaderprice">Pris</div></th>
            <th class="thtableheaderaddress"><div class="tableheaderaddress">Adresse</div></th>
            <th class="thtableheaderobserved unsortable"><div class="tableheaderobserved">Tidspunkt</div></th>
        </tr>
    </thead>
    <tfoot>
        <tr class="unsortable">
            <td colspan="4"><br />* Denne pris er indberettet af selskabet <a style="margin-left: 40px;" href="/indberet">Indberet pris</a></td>
        </tr>
    </tfoot>
    <tbody id="list_canvas">
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/f24.jpg" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/f24/f24-frederiksborgvej-1" class="octanelink">10.57</a></td>
            <td class="tablebodyaddress" title="Frederiksborgvej 1 3600 Frederikssund">&nbsp;<a href="/f24/f24-frederiksborgvej-1" class="octanelink">Frederiksborgvej 1 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/f24/f24-frederiksborgvej-1" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/q8.gif" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/q8/q8-jernbanegade-43" class="octanelink">10.67</a></td>
            <td class="tablebodyaddress" title="Jernbanegade 43 3600 Frederikssund">&nbsp;<a href="/q8/q8-jernbanegade-43" class="octanelink">Jernbanegade 43 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/q8/q8-jernbanegade-43" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/shell.gif" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">11.87</a></td>
            <td class="tablebodyaddress" title="Ny Østergade 12 3600 Frederikssund">&nbsp;<a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">Ny Østergade 12 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/shell/shell-ny-%C3%B8stergade-12" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/shell.gif" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/shell/shell-askelundsvej-1" class="octanelink">11.87</a></td>
            <td class="tablebodyaddress" title="Askelundsvej 1 3600 Frederikssund">&nbsp;<a href="/shell/shell-askelundsvej-1" class="octanelink">Askelundsvej 1 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/shell/shell-askelundsvej-1" class="octanelink">1 time 57 m </a></td>
        </tr>
        <tr>
            <td class="tablebodylogo"><img src="/sites/all/themes/benzinpriser/logo/circlek.png" alt="" style="width:32px; height: 18px;" /></td>
            <td class="tablebodyprice">&nbsp;<a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">10.00</a></td>
            <td class="tablebodyaddress" title="Frederiksværkvej 16 3600 Frederikssund">&nbsp;<a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">Frederiksværkvej 16 3600 Frederikssund</a></td>
            <td class="tablebodydate"><a href="/circle-k/circle-k-servicenter-frederiksv%C3%A6rkvej-16" class="octanelink">1 time 57 m </a></td>
        </tr>
    </tbody>
</table>

I'm trying to grab the table price and address. 我想抢表价和地址。

Here is my current code. 这是我当前的代码。

package com.example.android.soup;

import android.os.Bundle;
import android.support.v7.app.AppCompatActivity;
import android.view.View;
import android.widget.TextView;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class MainActivity extends AppCompatActivity {

    @Override
    protected void onCreate(Bundle savedInstanceState) {
        super.onCreate(savedInstanceState);
        setContentView(R.layout.activity_main);
    }

    public void fetch(View View){
        String sNodes = "";
        TextView text = (TextView) findViewById(R.id.text1234);
        try
        {
            Document doc = Jsoup.parse("http://www.fdmbenzinpriser.dk/searchprices/1/3600");
            System.out.println(doc.getElementById("list_canvas"));
        }
        catch (Exception e)
        {
            e.printStackTrace();
        }
        text.setText(sNodes);
    }
}

parse() will parse a document from a String ( https://jsoup.org/cookbook/input/parse-document-from-string ). parse()将从字符串( https://jsoup.org/cookbook/input/parse-document-from-string )解析文档。 You passed it a URL which is not a HTML string. 您为它传递的URL不是HTML字符串。 You have to get() the data from the URL. 您必须从URL获取()数据。 That's the problem. 那就是问题所在。 Here is a working example: 这是一个工作示例:

 Document doc = Jsoup.connect("http://www.fdmbenzinpriser.dk/searchprices/1/3600").get();


      System.out.println(doc.getElementById("list_canvas"));

https://jsoup.org/cookbook/input/load-document-from-url https://jsoup.org/cookbook/input/load-document-from-url

Since you are really interested in accessing tbody tags, you can try 由于您确实对访问tbody标签感兴趣,因此可以尝试

final Elements tbodyElements = doc.getAllElements().first().getElementsByTag("tbody");
for( int x = 0; x < tbodyElements.size(); x++ )
{
    if( tbodyElements.get(x).attr("id").equals("list_canvas") )
    {
        // You know you are inside tbody tag, find all the td elements in it
        final Elements tdElems = tbodyElements.get(x).getElementsByTag("td");
        for( int y = 0; y < tdElems.size(); y++ )
        {
             final Element tdElem = tdElems.get(y);
             if( tdElem.attr("tablebodylogo") )
             {
                 // this will get you tags within tablebodylogo
                 final Elements childrenTDLogo = tdElem.children();
             }
             else if( tdElem.attr("tablebodyprice") )
             {

                 // this will get you tags within tablebodyprice
                 final Elements childrenTDPrice = tdElem.children();
             }                     
             else if( tdElem.attr("tablebodyaddress") )
             {

                 // this will get you tags within tablebodyaddress
                 final Elements childrenTDAddress = tdElem.children();
             }                     

             else if( tdElem.attr("tablebodydate") )
             {

                 // this will get you tags within tablebodydate
                 final Elements childrenTDDate = tdElem.children();
             }                     
        } 
    }
}

Referring to the official documentation of jsoup would greatly boost your understanding of how to play with org.jsoup.nodes.Element and org.jsoup.select.Elements, will really help you a lot. 参考jsoup的官方文档将极大地增进您对如何使用org.jsoup.nodes.Element和org.jsoup.select.Elements的理解,这将对您有很大帮助。 It's an amazing library for parsing html documents, I don't think it's the best one for grabbing online html pages. 这是一个很棒的用于解析html文档的库,我认为它不是抓取在线html页面的最佳库。 But stil, hope you are helped. 但是,希望您能得到帮助。 Clarifications are welcome 欢迎澄清

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM