简体   繁体   中英

Java how to encode single quote and double quote into HTML entities?

How can I encode " into " and ' into ' ?

I am quite suprised single quote and double quote is not defined in HTML Entities 4.0, and so StringEscapeUtils not able to escape these 2 characters into respective entities.

Is there any other String related tool able to do this?

Any reason why single quote and double quote is not defined in HTML Entities 4.0?

Besides single quote and double quote, is there any framework able to encode all the unicode character into respective entities? Since all the unicode can be manually translate into decimal entities and show in HTML, so wonder is there any tool able to convert it automatically?

  1. Single quote and double quote not defined in HTML 4.0

Single quote only is not defined in HTML 4.0, double quote is defined as " starting HTML2.0

  1. StringEscapeUtils not able to escape these 2 characters into respective entities

escapeXml11 in StringEscapeUtils supports converting single quote into ' .

For Example:

StringEscapeUtils.escapeXml11("'"); //Returns '
StringEscapeUtils.escapeHtml4("\""); //Returns "
  1. Is there any other String related tool able to do this?

HTMLUtils from Spring framework takes care of single quotes & double quotes, it also converts the values to decimal (like ' & " ). Following example is taken from the answer to this question :

import org.springframework.web.util.HtmlUtils;
[...]
HtmlUtils.htmlEscapeDecimal("&")` //gives &
HtmlUtils.htmlEscape("&")` //gives &
  1. Any reason why single quote and double quote is not defined in HTML Entities 4.0?

As per Character entity references in HTML 4 the single quote is not defined. Double quote is available from HTML2.0. Whereas single quote is supported as part of XHTML1.0 .

  1. Tool or method to encode all the unicode character into respective entities

There is a very good & simple java implementation mentioned as part of an answer to this question .

Following is a sample program based on that answer:

import org.apache.commons.lang3.StringEscapeUtils;

public class HTMLCharacterEscaper {
    public static void main(String[] args) {        
        //With StringEscapeUtils
        System.out.println("Using SEU: " + StringEscapeUtils.escapeHtml4("\" ¶"));
        System.out.println("Using SEU: " + StringEscapeUtils.escapeXml11("'"));

        //Single quote & double quote
        System.out.println(escapeHTML("It's good"));
        System.out.println(escapeHTML("\" Grit \""));

        //Unicode characters
        System.out.println(escapeHTML("This is copyright symbol ©"));
        System.out.println(escapeHTML("Paragraph symbol ¶"));
        System.out.println(escapeHTML("This is pound £"));      
    }

    public static String escapeHTML(String s) {
        StringBuilder out = new StringBuilder(Math.max(16, s.length()));
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (c > 127 || c == '"' || c == '<' || c == '>' || c == '&' || c == '\'') {
                out.append("&#");
                out.append((int) c);
                out.append(';');
            } else {
                out.append(c);
            }
        }
        return out.toString();
    }

}

Following are some interesting links, which i came across during the pursuit of the answer:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM