I have a string with an emoji in it
I love 🍿
I need to escape that popcorn emoji with it's html entity so I get
I love 🍿
I'm am writing my code in Java and I have been trying different StringEscapeUtils libraries but haven't gotten it to work. Please help me figure out what I can use to escape special characters like Popcorn.
For reference:
It's a little hacky, because I don't believe there is a ready made library to do this; assuming you can't simply use UTF-8 (or UTF-16) on your HTML page (which should be able to render 🍿 as is), you can use Character.codePointAt(CharSequence, int)
and Character.offsetByCodePoints(CharSequence, int, int)
1 to perform the conversion if the given character is outside the normal ASCII range. Something like,
String str = "I love 🍿";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
if (ch > 127) {
sb.append(String.format("&#x%x;", Character.codePointAt(str, i)));
i += Character.offsetByCodePoints(str, i, 1) - 1;
} else {
sb.append(ch);
}
}
System.out.println(sb);
which outputs (as requested)
I love 🍿
1 Edited based on helpful comments from Andreas.
Normally the emoji4j library works. It has a simple htmlify
method for HTML encoding.
For example:
String text = "I love 🍿";
EmojiUtils.htmlify(text); //returns "I love 🍿"
EmojiUtils.hexHtmlify(text); //returns "I love 🍿"
You may use the unbescape
library: unbescape: powerful, fast and easy escape/unescape operations for Java .
Add the dependency into the pom.xml
file:
<dependency>
<groupId>org.unbescape</groupId>
<artifactId>unbescape</artifactId>
<version>1.1.6.RELEASE</version>
</dependency>
The usage:
import org.unbescape.html.HtmlEscape;
import org.unbescape.html.HtmlEscapeLevel;
import org.unbescape.html.HtmlEscapeType;
<…>
final String inputString = "\uD83C\uDF7F";
final String escapedString = HtmlEscape.escapeHtml(
inputString,
HtmlEscapeType.HEXADECIMAL_REFERENCES,
HtmlEscapeLevel.LEVEL_2_ALL_NON_ASCII_PLUS_MARKUP_SIGNIFICANT
);
// Here `escapedString` has the value: `🍿`.
For your use case, probably, either HtmlEscapeType.HTML4_NAMED_REFERENCES_DEFAULT_TO_HEXA
or HtmlEscapeType.HTML5_NAMED_REFERENCES_DEFAULT_TO_HEXA
should be used instead of HtmlEscapeType.HEXADECIMAL_REFERENCES
.
I would use CharSequence::codePoints
to get an IntStream
of the code points and map them to strings, and then collect them, concatenating to a single string:
public String escape(final String s) {
return s.codePoints()
.mapToObj(codePoint -> codePoint > 127 ?
"&#x" + Integer.toHexString(codePoint) + ";" :
new String(Character.toChars(codePoint)))
.collect(Collectors.joining());
}
For the specified input, this produces:
I love 🍿
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.