[英]How can i convert every special character and emoji into its html entity using javascript?
My current code converts characters into entities as expected.我当前的代码按预期将字符转换为实体。 But if I convert emoji, then it generates something like �� for which doesn't render as expected.
但是,如果我转换表情符号,那么它会生成类似 �� 的内容,但不会按预期呈现。
String.prototype.toHtmlEntities = function() { return this.replace(/./gm, function(s) { // return "&#" + s.charCodeAt(0) + ";"; return (s.match(/[a-z0-9\s]+/i))? s: "&#" + s.charCodeAt(0) + ";"; }); }; console.log("".toHtmlEntities()) document.write("".toHtmlEntities())
You're iterating over the code units of your string.您正在迭代字符串的代码单元。 Instead, you want to iterate over the code points .
相反,您想遍历代码点。 Most emojis consist of one code point, which is encoded by two code units called surrogate pairs - one high and one low one.
大多数表情符号由一个代码点组成,该代码点由两个称为代理对的代码单元编码 - 一高一低。 Surrogate pairs when displayed standalone don't represent a valid symbol, which ends up with
�
being rendered.独立显示时的代理对不代表一个有效的符号,最终以
�
被渲染。 If you use the u
(unicode) flag on your regular expression, your .
如果您在正则表达式上使用
u
(unicode)标志,您的.
will then match based on the code points, allowing you to iterate over each code point (rather than code unit).然后将根据代码点进行匹配,允许您迭代每个代码点(而不是代码单元)。 You can then access the code point value using
codePointAt(0)
, which you can then encode into a HTML entity:然后,您可以使用
codePointAt(0)
访问代码点值,然后您可以将其编码为 HTML 实体:
String.prototype.toHtmlEntities = function() { return this.replace(/[^a-z0-9\s]/ugm, s => "&#" + s.codePointAt(0) + ";"); }; console.log("a".toHtmlEntities()); document.write("a".toHtmlEntities()); console.log("&".toHtmlEntities()); document.write("&".toHtmlEntities()); console.log("".toHtmlEntities()); // surrogate pair test document.write("".toHtmlEntities()); console.log("".toHtmlEntities()); // ZWJ test document.write("".toHtmlEntities()); console.log("❤️".toHtmlEntities()); // variation selector test document.write("❤️".toHtmlEntities()); // variation selector test console.log("ñ".toHtmlEntities()); // decomposed character test (length of 2) document.write("ñ".toHtmlEntities()); // decomposed character test (length of 2) console.log("ñ".toHtmlEntities()); // composed character (length of 1) document.write("ñ".toHtmlEntities()); // composed character (length of 1)
If you just want to replace the emoji characters, you can use \p{Emoji}
to match those (or another regular expression to match your specific characters), and replace those with their code points, eg:如果您只想替换表情符号字符,您可以使用
\p{Emoji}
来匹配那些(或另一个正则表达式来匹配您的特定字符),并用它们的代码点替换它们,例如:
String.prototype.toHtmlEntities = function() { return this.replace(/\p{Emoji}/ugm, s => '&#' +s.codePointAt(0) + ";"); }; console.log("a".toHtmlEntities()); document.write("a".toHtmlEntities()); console.log("&".toHtmlEntities()); document.write("&".toHtmlEntities()); console.log("".toHtmlEntities()); // surrogate pair test document.write("".toHtmlEntities()); console.log("".toHtmlEntities()); // ZWJ test document.write("".toHtmlEntities()); console.log("❤️".toHtmlEntities()); // variation selector test document.write("❤️".toHtmlEntities()); // variation selector test console.log("ñ".toHtmlEntities()); // decomposed character test (length of 2) document.write("ñ".toHtmlEntities()); // decomposed character test (length of 2) console.log("ñ".toHtmlEntities()); // composed character (length of 1) document.write("ñ".toHtmlEntities()); // composed character (length of 1)
As always, if you're going to be modifying the prototype of inbuilt JavaScript objects, ensure you know the consequences of doing so .与往常一样,如果您要修改内置 JavaScript 对象的原型,请确保您知道这样做的后果。 It is instead recommended to create a new function and pass the string you want to convert into that function as an argument.
相反,建议创建一个新的 function 并将要转换为该 function 的字符串作为参数传递。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.