简体   繁体   中英

Jsoup - How to detect strictly adjacent elements - check if element has been removed

I need to detect strictly adjacent elements with jsoup. For this I would use the example provided in How to detect strictly adjacent siblings but I need a working example for Jsoup - java.


<div id="container">
    <span class="highlighted">Paragraph 1</span>
    <span class="highlighted">Paragraph 2</span>
    This is just loose text.
    <p class="highlighted">Paragraph 3</p>

What I'm trying to accomplish is to build a single element with the text of all sibling similar elements.

private String removeSimilarTags(String htmlContent){
        org.jsoup.nodes.Document doc = Jsoup.parse(htmlContent);

        Elements highlightedSpanElements = doc.select("span.highlighted"); //Selecting all spans with class highlight
        for(Element span : highlightedSpanElements){
            Element beforeEl = span.previousElementSibling();
            if(span != null) //I need another function to verify if element has been already removed{
                beforeEl.after("<span class='"+HIGHLIGHT+"'>"+mergeAdjacentSpans(span)+"</span>");
        return doc.outerHtml();

 private String mergeAdjacentSpans(Element span){
        Element nextEl = span.nextElementSibling() != null ? span.nextElementSibling() : null;
        String text = span.text();
        if(nextEl != null && nextEl.tagName().equalsIgnoreCase(SPAN_TAG)
                          && nextEl.classNames().contains(HIGHLIGHT)){
            //Next Element is also  a highlighted span
           text =  text.concat(" "+ mergeAdjacentSpans(spanEl));
        return text;

And also I would like to have some insights of how to verify an element has been already removed. I cannot find a clear answer online.

Thank you guys !

So for detecting if elements are strictly adjacent you should know the difference between Node and Element in Jsoup https://stackoverflow.com/questions/47881838/difference-between-jsoup-element-and-jsoup-node#:~:text=A%20node%20is%20the%20generic,Node . In my case I used Node because it contains whatever elements comes after being a string or an actual element, so it's not tagged element sensitive.

private boolean isNexSiblingAdjacent(Element span){
  Node informationAfterNode = span.nextSibling();
  Element nextTaggedElement = span.nextElementSibling();
  return informationAfterNode.outerHtml().trim().length() == 0 ||

So the first condition I do is to verify that it only has blank spaces inside but you can check if it starts with <.- and it ends with -> to check if it is a comment too. As these two conditions will make it still adjacent. And last but no least check if the html of the node is similar to the one in element.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM