简体   繁体   中英

How to escape HTML special characters in Java?

Is there a way to convert a string to a string that will display properly in a web document? For example, changing the string

"<Hello>"

To

"&lt;Hello&gt;"

That's usually called "HTML escaping". I'm not aware of anything in the standard libraries for doing this (though you can approximate it by using XML escaping). There are lots of third-party libraries that can do this, however. StringEscapeUtils from org.apache.commons.lang has a escapeHtml method that can do this.

public static String stringToHTMLString(String string) {
    StringBuffer sb = new StringBuffer(string.length());
    // true if last char was blank
    boolean lastWasBlankChar = false;
    int len = string.length();
    char c;

    for (int i = 0; i < len; i++)
        {
        c = string.charAt(i);
        if (c == ' ') {
            // blank gets extra work,
            // this solves the problem you get if you replace all
            // blanks with &nbsp;, if you do that you loss 
            // word breaking
            if (lastWasBlankChar) {
                lastWasBlankChar = false;
                sb.append("&nbsp;");
                }
            else {
                lastWasBlankChar = true;
                sb.append(' ');
                }
            }
        else {
            lastWasBlankChar = false;
            //
            // HTML Special Chars
            if (c == '"')
                sb.append("&quot;");
            else if (c == '&')
                sb.append("&amp;");
            else if (c == '<')
                sb.append("&lt;");
            else if (c == '>')
                sb.append("&gt;");
            else if (c == '\n')
                // Handle Newline
                sb.append("&lt;br/&gt;");
            else {
                int ci = 0xffff & c;
                if (ci < 160 )
                    // nothing special only 7 Bit
                    sb.append(c);
                else {
                    // Not 7 Bit use the unicode system
                    sb.append("&#");
                    sb.append(new Integer(ci).toString());
                    sb.append(';');
                    }
                }
            }
        }
    return sb.toString();
}

HTMLEntities is an Open Source Java class that contains a collection of static methods (htmlentities, unhtmlentities, ...) to convert special and extended characters into HTML entitities and vice versa.

http://www.tecnick.com/public/code/cp_dpage.php?aiocp_dp=htmlentities

Better do it yourself, if you know the logic behind - it is easy:

 public class ConvertToHTMLcode {
        public static void main(String[] args) throws IOException {
          String specialSymbols = "ễ%ß Straße";
          System.out.println(convertToHTMLCodes(specialSymbols)); //&#7877;%&#223;
   }

   public static String convertToHTMLCodes(String str) throws IOException {
      StringBuilder sb = new StringBuilder();
      int len = str.length();
      for(int i = 0; i < len; ++i) {
          char c = str.charAt(i);
         if (c > 127) {
            sb.append("&#");
            sb.append(Integer.toString(c, 10));
            sb.append(";");
        } else {
            sb.append(c);
        }
     }
       return sb.toString();
   }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM