简体   繁体   中英

How to remove only text between two different Tags using JSoup

<div class="orcl6w2">
  <div class="orcl6w3">
    <table >
      <tbody>
        <tr>
          <td>
            <table>
              <tbody>
                <tr>
                  <td>
                    <center>
                      <strong>As Published In </strong>                                      
                    </center>
                  </td>
                </tr>
              </tbody>
            </table>
            <h2>DEVELOPER: PL/SQL Practices</h2>
            <hr />
            <strong>Steven Feuerstein </strong>This has to be deleted                                                        
            <em>Oracle PL/SQL Programming</em>This has to be deleted too.                                                             
          </td>
        </tr>
      </tbody>
    </table>
  </div>
</div>

Here I want to delete the text as well as tag after hr tag. ie,

<hr />
<strong>Steven Feuerstein </strong>This has to be deleted                                                        
<em>Oracle PL/SQL Programming</em>This has to be deleted too.

I tried to delete by using the below code. But with the below code only i'm able to delete the tag that is after hr tag. But i'm unable to delete the text ie, This has to be deleted and This has to be deleted too. .

if (elements.select("hr").size() > 0) {
    final Element hrfound = elements.select("hr").last();
    final int hrIdx = hrfound.siblingIndex();

    for (Element e : hrfound.siblingElements()) {
        if (e.siblingIndex() > hrIdx) {
            e.remove();
        }
    }
} 

Please help....

The method ( hrfound.siblingElements() ) you are using only gets Element objects, but the text you are trying to delete as well is of type Node . I also wouldn't try to remove them by using the indexes. Instead, after finding the hr element, you can use the nextSibling() method to get the siblings after that; it will select type Node so it will get both the elements as well as the text nodes.

The below code should accomplish what you are trying to do:

while(hrfound.nextSibling() != null) {
    hrfound.nextSibling().remove();
} 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM