Jsoup ignore nested tags when using select

Question

I am trying to parse a site that uses

 <b>Header</b>Data<strong>Header</strong>Data

so I have a selector that is

.select("b, strong")

and then try to extract the text between. - Everything is fine.

Problem: Sometimes the site has eg.

<strong><strong>HeaderX</strong><br /></strong>Data

now this messes with my loops since I will get text headerX twice, how can I ignore the nested strong?

Solved, but probably has some better way. 解决，但可能有更好的方法。

Elements selected = info.select("b, strong");
Element next = selected.get(0);
Element now = null;
for (int i = 0; next != null ;i++) {
    now = next;
    next = null;
    Elements children = now.getAllElements();
    for (;selected.size() > i; i++) {
        next = selected.get(i);
        if (!children.contains(next)) {
            break;
        }
    }
    //Do whatever with now & next
}

Answer 1

Try This :

EDIT

  info.select("b,strong").remove().text();

Answer 2

你可以尝试一下：

doc.select("strong > strong, strong:last-child");

Jsoup ignore nested tags when using select

Question

2 answers

solution1
0 2013-08-23 01:46:56

solution2
0 2013-08-23 15:54:36

Jsoup ignore nested tags when using select

Question

2 answers

solution1 0 2013-08-23 01:46:56

solution2 0 2013-08-23 15:54:36

solution1
0 2013-08-23 01:46:56

solution2
0 2013-08-23 15:54:36