简体   繁体   English

指定要抓取的数据 - Jsoup + Android Studio

[英]Specifying what data to scrape - Jsoup + Android Studio

I am using JSoup to scrape data and display in on my phone using android studio. 我正在使用JSoup来抓取数据并使用android studio在我的手机上显示。 I have code that will scrape all the <td> tags however i am not trying to scrape them all, just certain ones in a certain order. 我有代码将刮掉所有的<td>标签但是我并不是想把它们全部刮掉,只是按某种顺序抓住它们。

  </tr>
</table>
</td>
</tr><tr>
<td>
<table cellspacing='0' border='0' width='100%' >
<col align='left' /><col align='center' /><col align='right' />
  <tr>
    <td></td><td></td><td></td>

Also when it displays on my phone the <td> is being displayed and I don't want them to. 此外,当它显示在我的手机上时, <td>正在显示,我不希望它们显示。 I don't want to scrape any of the <td> tags from the html above 我不想从上面的html中删除任何<td>标签

<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:00</font></td>
    <td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:15</font></td>
    <td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:30</font></td>
    <td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:45</font></td>

Above and below is the HTML I want to scrape. 上面和下面是我要抓的HTML。

<tr >
    <td style="border-bottom:3px solid #000000;" rowspan='1' bgcolor='#C0C0C0'><font color='#FFFFFF'>Mon</font></td>
    <td style="border-bottom:3px solid #000000;"  colspan='12' rowspan='1' >

<table  cellspacing='0' border='0' width='100%'>
  <col align='left' />
<tr>
  <td align='left'><font color='#FF0000'>Sounds</font></td>
</tr>
</table>
<table  cellspacing='0' border='0' width='100%'>
  <col align='left' />
  <col align='right' />
<tr>
  <td align='left'><font color='#000000'>P0000</font></td>
  <td align='right'><font color='#008000'>P.Man</font></td>
</tr>
</table>

What I want it to display is "Mon" then "9:00" then "Sounds" then "P0000" and then "P.Man. 我希望它显示的是“Mon”然后是“9:00”然后是“Sounds”然后是“P0000”然后是“P.Man。

This is the code I have atm. 这是我的代码。 Any one any clues? 任何一条线索? read the documentation. 阅读文档。

 Elements tableElements = doc.select("td");
                for (Element td : tableElements) {
                    buffer.append("TT [" + td + "] \r\n");
                    Log.d("JSwA", "TT [" + td + "]");
                }
            }

Try this CSS selector: 试试这个CSS选择器:

#post-15 > div > table:nth-child(6) > tbody > tr:nth-child(2) > td:nth-child(2) > table:not(:last-of-type)

DEMO DEMO

SAMPLE CODE 示例代码

String text = doc.select("#post-15 > div > table:nth-child(6) > tbody > tr:nth-child(2) > td:nth-child(2) > table:not(:last-of-type)").text();
// text should contain "Sounds P0000 P.Man"

The above code line tells Jsoup to find all the tables, except the last one ,containing the desired texts. 上面的代码行告诉Jsoup找到包含所需文本的所有表,除了最后一个表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM