简体   繁体   English

Jsoup 从 html 表中提取数据

[英]Jsoup to extract data from html table

I've started using JSoup today to use for an android app so I have this table which I need to extract data from, but from it seems, it's going to be tough.我今天已经开始使用 JSoup 来用于 android 应用程序,所以我有这张表,我需要从中提取数据,但看起来,这将是艰难的。 Need some help;需要一些帮助; the html for the table is as below:表中的 html 如下:

<TR BGCOLOR='#999999'>
      <TD ALIGN='left'><span class='S09W80'><font color=#DDDDDD>CODE</span></TD>
      <TD ALIGN='left'><span class='S09W80'><font color=#DDDDDD>SUBJECT NAME</span></TD>
      <TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>PERIOD FROM</span></TD>
      <TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>PERIOD TO</span></TD>
      <TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>ENROL DATE</span></TD>
      <TD ALIGN='right'><span class='S09W80'><font color=#DDDDDD>GRADE</span></TD>                
</TR>

followed by repetitions of随后是重复的

<TR BGCOLOR='#FFFFFF'>
  <TD ALIGN='left'><span class='S09W50'>IT142</span></TD>
  <TD ALIGN='left'><span class='S09W50'>INTRODUCTION TO GRAPHICS DEVELOPMENT</span></TD>
  <TD ALIGN='right'><span class='S09W50'>21-FEB-11</span></TD>
  <TD ALIGN='right'><span class='S09W50'>17-JUN-11</span></TD>
  <TD ALIGN='right'><span class='S09W50'>22-FEB-11</span></TD>
  <TD ALIGN='center'><span class='S09W80'>B-</span></TD>
</TR>

but how do I use the doc.select (what selector to use?);但是我如何使用 doc.select (使用什么选择器?); here?这里?

Not really an Android question, but a CSS selector question.不是真正的 Android 问题,而是 CSS 选择器问题。 You can read more about it at http://www.w3.org/TR/CSS2/selector.html您可以在http://www.w3.org/TR/CSS2/selector.html了解更多信息

Doing screen scraping like this is always tricky and there is no "right" solution.像这样进行屏幕抓取总是很棘手,并且没有“正确”的解决方案。

You will need to perform multiple select steps.您将需要执行多个 select 步骤。

  1. A selector like "body > table > tr".像“body > table > tr”这样的选择器。 Take the first element.取第一个元素。 This will give you the initial TR element.这将为您提供初始 TR 元素。
  2. Validate the TR element, get its child elements and validate one of them has the text "SUBJECT NAME".验证 TR 元素,获取其子元素并验证其中一个具有文本“SUBJECT NAME”。
  3. Then the other TR elements can be processed in order.然后可以依次处理其他TR元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM