[英]Extract specific html table content using Jsoup
I want to extract sepecific content from a responsive HTML table, I am using Jsoup. 我想从响应HTML表中提取特定内容,我正在使用Jsoup。
Here is the structure of my table : 这是我的桌子的结构:
<table id="main_widget_table" class="table table-striped table-hover table-condensed table-bordered">
<tbody>
<!-- ngRepeat: object in currentView --><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_BACKUP</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task backup</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: TASKMUBACKUP</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_TOTO</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task toto</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: TASKMUTOTO</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_FTP</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task ftp</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: TASKMUFTP</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_MSSQL</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task mssql</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: TASKMUMSSQL</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_ORACLE</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task oracle</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: TASKMUORA1</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_TUTU</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task tutu</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: TASKMUTUTU</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_TITI</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task titi</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: TASKMUTITI</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_WSB</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task wsb</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: MUWSB</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_SAP</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task sap</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: FRQPMDEV18</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr><tr ng-repeat="object in currentView" class="ng-scope">
<td>
<a id="main_widget_table_object_name_action" href="#//object/" target="_blank">
<b class="ng-binding">TASK_BATCH</b>
</a>
<p style="font-size:11px">
<span class="text-success" ng-show="object.label"><em class="ng-binding"> task batch</em></span>
<br ng-show="object.label">
<span ng-show="object.session" class="ng-binding" style="display: none;">
<span class="label label-default">WORKFLOW</span> <em class="ng-binding">
</em>
</span>
<br ng-show="object.session" style="display: none;">
<span ng-hide="object.session" class="ng-binding">
<span class="label label-default">JOB</span> <em class="ng-binding"></em>
</span>
<br ng-hide="object.session">
<span class="text-warning ng-binding">Location: MUFRQPMDE</span>
<span ng-show="isShowing('nextPlanified')" class="badge pull-right ng-binding" style="display: none;">
</span>
</p>
</td>
</tr>
</tbody>
</table>
I only one to extract the value between bold tags, for instance for the first TD the value is TASK_TOTO. 我只有一个提取粗体标记之间的值,例如对于第一个TD,该值是TASK_TOTO。
Here is my JAVA code : 这是我的JAVA代码:
ublic class HtmlParser {
public class HtmlParser {
public static void main(String[] args) throws Exception {
Document doc = Jsoup.connect("http://frstmwarwebsrv2.orsyptst.com:9000/ui/#/en/search?searchString=TSK&filterchecks=nameSWF").get();
for (Element table : doc.select("#search_results_table")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
System.out.println(tds.get(0).text());
}
}
}
}
I am a newbie to JSOUP and my code does not diplay anything so far. 我是JSOUP的新手,到目前为止,我的代码没有显示任何异常。 I am using the table id to locate the table.
我正在使用表ID来查找表。
Thanks For your help 谢谢你的帮助
FYI : My table is generated using angular JS so Jsoup is not the best way to extract the table data. 仅供参考:我的表是使用有角JS生成的,因此Jsoup并不是提取表数据的最佳方法。
When using this code instead : 当使用此代码代替时:
List<WebElement> resultsDiv = driver.findElements(By.xpath("id('search_results_table')"));
for (int i=0; i<resultsDiv.size(); i++) {
System.out.println( resultsDiv.get(i).getText());
System.out.println (resultsDiv.size());
I still don't get the content displayed and the size is set to 1!! 我仍然没有显示内容,大小设置为1! I am not sure what I am doing wrong!!
我不确定我在做什么错!!
Well, based on the HTML snippet you provided, the table's id is main_widget_table
, not search_results_table
. 好吧,根据您提供的HTML代码段,表的ID是
main_widget_table
,而不是search_results_table
。 (The URL in your code is no longer accessible, so I can't tell if there's some other search_results_table
on that page.) (您代码中的URL不再可用,因此我无法确定该页面上是否还有其他
search_results_table
。)
You can print the text of all b
tags in that table with 您可以使用以下命令打印该表中所有
b
标签的文本:
for (Element e : doc.select("#main_widget_table b"))
System.out.println(e.text());
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.