简体   繁体   中英

Specifying what data to scrape - Jsoup + Android Studio

I am using JSoup to scrape data and display in on my phone using android studio. I have code that will scrape all the <td> tags however i am not trying to scrape them all, just certain ones in a certain order.

  </tr>
</table>
</td>
</tr><tr>
<td>
<table cellspacing='0' border='0' width='100%' >
<col align='left' /><col align='center' /><col align='right' />
  <tr>
    <td></td><td></td><td></td>

Also when it displays on my phone the <td> is being displayed and I don't want them to. I don't want to scrape any of the <td> tags from the html above

<td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:00</font></td>
    <td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:15</font></td>
    <td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:30</font></td>
    <td   bgcolor='#C0C0C0' colspan='1'><font color='#FFFFFF'>9:45</font></td>

Above and below is the HTML I want to scrape.

<tr >
    <td style="border-bottom:3px solid #000000;" rowspan='1' bgcolor='#C0C0C0'><font color='#FFFFFF'>Mon</font></td>
    <td style="border-bottom:3px solid #000000;"  colspan='12' rowspan='1' >

<table  cellspacing='0' border='0' width='100%'>
  <col align='left' />
<tr>
  <td align='left'><font color='#FF0000'>Sounds</font></td>
</tr>
</table>
<table  cellspacing='0' border='0' width='100%'>
  <col align='left' />
  <col align='right' />
<tr>
  <td align='left'><font color='#000000'>P0000</font></td>
  <td align='right'><font color='#008000'>P.Man</font></td>
</tr>
</table>

What I want it to display is "Mon" then "9:00" then "Sounds" then "P0000" and then "P.Man.

This is the code I have atm. Any one any clues? read the documentation.

 Elements tableElements = doc.select("td");
                for (Element td : tableElements) {
                    buffer.append("TT [" + td + "] \r\n");
                    Log.d("JSwA", "TT [" + td + "]");
                }
            }

Try this CSS selector:

#post-15 > div > table:nth-child(6) > tbody > tr:nth-child(2) > td:nth-child(2) > table:not(:last-of-type)

DEMO

SAMPLE CODE

String text = doc.select("#post-15 > div > table:nth-child(6) > tbody > tr:nth-child(2) > td:nth-child(2) > table:not(:last-of-type)").text();
// text should contain "Sounds P0000 P.Man"

The above code line tells Jsoup to find all the tables, except the last one ,containing the desired texts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM