I am trying to parse a site that uses
<b>Header</b>Data<strong>Header</strong>Data
so I have a selector that is
.select("b, strong")
and then try to extract the text between. - Everything is fine.
Problem: Sometimes the site has eg.
<strong><strong>HeaderX</strong><br /></strong>Data
now this messes with my loops since I will get text headerX twice, how can I ignore the nested strong?
Solved, but probably has some better way. 解决,但可能有更好的方法。
Elements selected = info.select("b, strong");
Element next = selected.get(0);
Element now = null;
for (int i = 0; next != null ;i++) {
now = next;
next = null;
Elements children = now.getAllElements();
for (;selected.size() > i; i++) {
next = selected.get(i);
if (!children.contains(next)) {
break;
}
}
//Do whatever with now & next
}
Try This :
EDIT
info.select("b,strong").remove().text();
你可以尝试一下:
doc.select("strong > strong, strong:last-child");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.