简体   繁体   中英

Removing only an html tag and leaving behind the text inside the tag using Jsoup

Just want to remove only the inner tag "span" and don't want to remove the text inside it

<blockquote>
      <span>I don’t even bring up technology.</span> 
          I talk about the flow of data.&rdquo;
      <cite>&ndash;Rick Hassman, CIO, Pella</cite>
</blockquote>

After parsing it should look like

    <blockquote>
            I don’t even bring up technology.
              I talk about the flow of data.&rdquo;
          <cite>&ndash;Rick Hassman, CIO, Pella</cite>
    </blockquote>

Please help..

The simplest way to solve it would be to use String.replace() method.

String newHtml = html.replaceAll( "<\\/?\\s*span.*?>", "");

If you prefer to use Jsoup, then it gets more complicated:

        Document doc = Jsoup.parse(html);
        for (Element e : doc.select("span")) {

            Element parent = e.parent();
            Element newParent = parent.clone();
            newParent.empty();
            for (Node n : parent.childNodes()) {

                if (n instanceof Element && ((Element) n).tag().getName().equals("span")) {
                    newParent.append(((Element) n).html());
                } else {
                    newParent.append(n.outerHtml());
                }

            }
            parent.replaceWith(newParent);

        }

If your tag is correct and you ask how to do this by Java...

String hi = "Hello World!"
String no_o = hi.replaceAll("o", "");

...should help.

Use StringUtils#substringBetween from Apache Commons Lang , it might save you a lot of effort.

String spanText = StringUtils.substringBetween(source, "<span>", "</span>");
String result = source.replaceAll("<span>.+</span>", spanText);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM