简体   繁体   中英

Groovy (or Java): How to escape double quotes only within HTML inner text, not in attributes

I am using a HTML rendering engine based on Groovy within a WCM system.
I now have the use case, that the user enters rich text content within a TinyMCE-based form, which looks like this:

<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr,
    <span style="text-decoration: underline;"
        sed diam nonumy
    </span> eirmod "tempor" invidunt ut labore et...
</p>

Within my Groovy renderer, I now want to feed this HTML snippet into the HTML document's content to client-side JavaScript processing.

What I need to do is:
Escape double quotes WITHIN content (see "tempor" token above), but not those encapsulating HTML attribute values (see "text-decoration" attribute above).

If I do

myHTML.replace("\"", "&quot;")

I will in fact escape EVERY double quote.

Any suggestions how I can only escape the quotes WITHIN the real text?

Converting my comment into this answer.

You can use JSoup ( jsoup.org ) in this manner to achieve this. (In your sample HTML, I have added two more places with quotes for the sake of testing.)

import java.util.List;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Node;
import org.jsoup.nodes.TextNode;

public class JSoupEscQuotes{
    public static void main( String[] args ){
        String html = "<p>Lorem ipsum \"dolor\" sit amet, consetetur sadipscing elitr,\r\n"
                + "    <span style=\"text-decoration: underline;\">\r\n"
                + "        sed \"diam\" nonumy\r\n"
                + "    </span> eirmod \"tempor\" invidunt ut labore et...\r\n"
                + "</p>";
        
        Document document = Jsoup.parse( html );
        
        StringBuilder sb = new StringBuilder();
        String s = replace( document );
        System.out.println( document );
    }

    private static String replace( Node node ){
        List<Node> cs = node.childNodes();
        if( cs == null || cs.size() == 0 ) return null;
        
        for( Node c : cs ) {
            if( c instanceof TextNode ) {
                TextNode t = (TextNode) c;
                TextNode tReplaced = new TextNode( t.text().replaceAll( "\"", "&quot;" ) );
                t.replaceWith( tReplaced );
            }
            else replace( c );
        }
        
        return null;
    }
}

If you are using Gradle, include JSoup like this. Or you may use the equivalent Maven configuration, if you are using Maven.

implementation 'org.jsoup:jsoup:1.14.3'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM