简体   繁体   中英

How can i convert every special character and emoji into its html entity using javascript?

My current code converts characters into entities as expected. But if I convert emoji, then it generates something like �� for which doesn't render as expected.

 String.prototype.toHtmlEntities = function() { return this.replace(/./gm, function(s) { // return "&#" + s.charCodeAt(0) + ";"; return (s.match(/[a-z0-9\s]+/i))? s: "&#" + s.charCodeAt(0) + ";"; }); }; console.log("".toHtmlEntities()) document.write("".toHtmlEntities())

You're iterating over the code units of your string. Instead, you want to iterate over the code points . Most emojis consist of one code point, which is encoded by two code units called surrogate pairs - one high and one low one. Surrogate pairs when displayed standalone don't represent a valid symbol, which ends up with being rendered. If you use the u (unicode) flag on your regular expression, your . will then match based on the code points, allowing you to iterate over each code point (rather than code unit). You can then access the code point value using codePointAt(0) , which you can then encode into a HTML entity:

 String.prototype.toHtmlEntities = function() { return this.replace(/[^a-z0-9\s]/ugm, s => "&#" + s.codePointAt(0) + ";"); }; console.log("a".toHtmlEntities()); document.write("a".toHtmlEntities()); console.log("&".toHtmlEntities()); document.write("&".toHtmlEntities()); console.log("".toHtmlEntities()); // surrogate pair test document.write("".toHtmlEntities()); console.log("".toHtmlEntities()); // ZWJ test document.write("".toHtmlEntities()); console.log("❤️".toHtmlEntities()); // variation selector test document.write("❤️".toHtmlEntities()); // variation selector test console.log("ñ".toHtmlEntities()); // decomposed character test (length of 2) document.write("ñ".toHtmlEntities()); // decomposed character test (length of 2) console.log("ñ".toHtmlEntities()); // composed character (length of 1) document.write("ñ".toHtmlEntities()); // composed character (length of 1)

If you just want to replace the emoji characters, you can use \p{Emoji} to match those (or another regular expression to match your specific characters), and replace those with their code points, eg:

 String.prototype.toHtmlEntities = function() { return this.replace(/\p{Emoji}/ugm, s => '&#' +s.codePointAt(0) + ";"); }; console.log("a".toHtmlEntities()); document.write("a".toHtmlEntities()); console.log("&".toHtmlEntities()); document.write("&".toHtmlEntities()); console.log("".toHtmlEntities()); // surrogate pair test document.write("".toHtmlEntities()); console.log("".toHtmlEntities()); // ZWJ test document.write("".toHtmlEntities()); console.log("❤️".toHtmlEntities()); // variation selector test document.write("❤️".toHtmlEntities()); // variation selector test console.log("ñ".toHtmlEntities()); // decomposed character test (length of 2) document.write("ñ".toHtmlEntities()); // decomposed character test (length of 2) console.log("ñ".toHtmlEntities()); // composed character (length of 1) document.write("ñ".toHtmlEntities()); // composed character (length of 1)

As always, if you're going to be modifying the prototype of inbuilt JavaScript objects, ensure you know the consequences of doing so . It is instead recommended to create a new function and pass the string you want to convert into that function as an argument.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM