[英]How to get first element children after td tag jsoup java
I need help. 我需要帮助。 I want to get all td text of this table in this web https://www.servientrega.com/RastreoContado/RastreoContado2.faces?idGuia=2003159909&idPais=1
我想在此网站中获取此表的所有td文本https://www.servientrega.com/RastreoContado/RastreoContado2.faces?idGuia=2003159909&idPais=1
Table with information what I want - image 带有我想要的信息的表-图片
. 。 But there is a special td tag with a hyperlink.
但是,有一个带有超链接的特殊td标签。
I only I want to get the of the detail package table. 我只想获取详细信息包表的。
special td tag - image 特殊的td标签-图片
My result and that I don't want - image 我的结果和我不想要的-图片
EDIT: 编辑:
I want to get the rows of the tbody with id=form01:tableEx4_data the third image has what I want. 我想用id = form01:tableEx4_data获取tbody的行,第三个图像是我想要的。
PART OF CODE THE WEB https://www.servientrega.com/RastreoContado/RastreoContado2.faces?idGuia=2003159909&idPais=1 : Web的部分代码https://www.servientrega.com/RastreoContado/RastreoContado2.faces?idGuia=2003159909&idPais=1 :
<tbody id="form01:tableEx4_data" class="ui-datatable-data ui-widget-content">
<tr data-ri="0" class="ui-widget-content ui-datatable-even" role="row">
<td role="gridcell"><span class="outputText">GUIA GENERADA</span></td>
<td role="gridcell"><span class="outputText">BOGOTA (CUNDINAMARCA)</span></td>
<td role="gridcell"><span class="outputText">04/04/2018 17:33:05</span></td>
</tr>
<tr data-ri="1" class="ui-widget-content ui-datatable-odd" role="row">
<td role="gridcell"><span class="outputText">INGRESO AL CENTRO LOGISTICO</span></td>
<td role="gridcell"><span class="outputText">BOGOTA (CUNDINAMARCA)</span></td>
<td role="gridcell"><span class="outputText">04/05/2018 01:35:25</span></td>
</tr>
<tr data-ri="2" class="ui-widget-content ui-datatable-even" role="row">
<td role="gridcell"><a href="#" id="form01:tableEx4:2:linkDesMov11" name="form01:tableEx4:2:linkDesMov11">SALIO A CIUDAD DESTINO</a>
<div id="form01:tableEx4:2:tooltip_linkDesMov11" class="ui-tooltip ui-widget ui-widget-content ui-shadow ui-corner-all">
<div>
<div style="display: none;">
Tipo moviento: |2|
</div>
<table id="form01:tableEx4:2:j_id1394398698_531cdaa3" class="ui-panelgrid ui-widget dataTableEx" style="min-width: 200px; max-width: 400px;" role="grid">
<tbody>
<tr class="ui-widget-content" role="row">
<td role="gridcell">
<table id="form01:tableEx4:2:j_id1394398698_531cda89" class="ui-panelgrid ui-widget headerClass2" style="width: 100%; min-width: 200px; max-width: 400px;" role="grid">
<tbody>
<tr class="ui-widget-content" role="row"></tr>
<tr class="ui-widget-content" role="row">
<td role="gridcell"><span style="width: 100%" class="outputText">Novedad</span></td>
</tr>
</tbody>
</table></td>
</tr>
<tr class="ui-widget-content" role="row">
<td role="gridcell">
<table id="form01:tableEx4:2:j_id1394398698_531cda7d" class="ui-panelgrid ui-widget headerClass2" style="width: 100%; min-width: 200px; max-width: 400px;" role="grid">
<tbody>
<tr class="ui-widget-content" role="row"></tr>
<tr class="ui-widget-content" role="row">
<td role="gridcell"><span style="width: 30%" class="outputText">Fecha Probable Entrega</span></td>
<td role="gridcell"><span style="width: 70%" class="outputText">Descripción de la novedad</span></td>
</tr>
</tbody>
</table></td>
</tr>
<tr class="ui-widget-content" role="row">
<td role="gridcell">
<table id="form01:tableEx4:2:j_id1394398698_531cda0f" class="ui-panelgrid ui-widget" style="width: 100%; min-width: 200px; max-width: 400px;" role="grid">
<tbody>
<tr class="ui-widget-content" role="row"></tr>
<tr class="ui-widget-content" role="row">
<td role="gridcell"><span style="width: 30%" class="outputText">07/04/2018</span></td>
<td role="gridcell"><span style="width: 70%" class="outputText"></span></td>
</tr>
</tbody>
</table></td>
</tr>
</tbody>
</table>
</div>
</div>
I can get td tags like (rows of table): GUIA GENERADA,BOGOTA (CUNDINAMARCA), 04/04/2018 17:33:05 - INGRESO AL CENTRO LOGISTICO, BOGOTA (CUNDINAMARCA), 04/05/2018 01:35:25 BUT the row "SALIO A CIUDAD DESTINO" it shows more details what I don't want. 我可以得到td标签,例如(表行):GUIA GENERADA,BOGOTA(CUNDINAMARCA),04/04/2018 17:33:05-INGRESO AL CENTRO LOGISTICO,BOGOTA(CUNDINAMARCA),04/05/2018 01:35: 25但在“ SALIO A CIUDAD DESTINO”行中,它显示了我不需要的更多详细信息。 Only I want the text "SALIO A CIUDAD DESTINO".
只有我想要文字“ SALIO A CIUDAD DESTINO”。
Maybe this can help you. 也许这可以帮助您。
WebElement datewidget = driver
.findElement(By.id("form01:tableEx4_data"));
List<WebElement> rows = datewidget.findElements(By.tagName("tr"));
List<WebElement> columns = datewidget.findElements(By.tagName("td"));
for (WebElement cell : columns) {
if (cell.getText().equals("SALIO A CIUDAD DESTINO")) {
cell.click();
break;
}
}
If you are using Jsoup, the code below might help you. 如果您使用的是Jsoup,下面的代码可能会对您有所帮助。
Element form = document.getElementById("form01:tableEx4_data");
Elements tdList = form.getElementsByTag("td");
for(Element td: tdList) {
System.out.println(td.text()); // td.text() is different from td.html()
}
The result will be like: 结果将如下所示:
GUIA GENERADA
BOGOTA (CUNDINAMARCA)
04/04/2018 17:33:05
INGRESO AL CENTRO LOGISTICO
BOGOTA (CUNDINAMARCA)
04/05/2018 01:35:25
SALIO A CIUDAD DESTINO
BOGOTA (CUNDINAMARCA)
04/05/2018 22:43:17
INGRESO AL CENTRO LOGISTICO
BARRANQUILLA (ATLANTICO)
04/06/2018 23:57:50
EN ZONA DE DISTRIBUCION
BARRANQUILLA (ATLANTICO)
04/09/2018 06:24:10
REPORTADO ENTREGADO
BARRANQUILLA (ATLANTICO)
04/09/2018 12:48:58
ENTREGA VERIFICADA
BARRANQUILLA (ATLANTICO)
04/09/2018 17:54:44
However, if you are using another library, you can also get the text between 'a' tag using regex. 但是,如果您正在使用其他库,则还可以使用正则表达式在'a'标记之间获取文本。
String s = "<a href=\"https://www.servientrega.com/RastreoContado/RastreoContado2.faces?idGuia=2003159909&idPais=1#\" id=\"form01:tableEx4:2:linkDesMov11\" name=\"form01:tableEx4:2:linkDesMov11\">SALIO A CIUDAD DESTINO</a>";
final Pattern pattern = Pattern.compile(">(.+?)<");
final Matcher matcher = pattern.matcher(s);
matcher.find();
System.out.println(matcher.group(1));
In this case, the result will be: 在这种情况下,结果将是:
SALIO A CIUDAD DESTINO
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.