[英]How to convert unicode characters to HTML numeric entities using plain Javascript
I'm trying to convert innerHTML with special characters into their original &#...;我正在尝试将带有特殊字符的 innerHTML 转换为原始的 &#...; entity values but can't seem to get it working for unicode values.
实体值,但似乎无法使其适用于 unicode 值。 Where am I going wrong?
我哪里错了?
The code is trying to take "Orig" - encode it and place it into "Copy"....该代码试图采用“Orig”-对其进行编码并将其放入“Copy”中....
Orig: 1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂原文: 1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂
Copy: 1:🙂 __2:𝌆 __3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂 副本:1:🙂 __2:𝌆 __3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂
... but obviously the dreaded black diamonds are appearing! ......但显然可怕的黑色钻石正在出现!
if (!String.prototype.codePointAt) { String.prototype.codePointAt = function(pos) { pos = isNaN(pos) ? 0 : pos; var str = String(this), code = str.charCodeAt(pos), next = str.charCodeAt(pos + 1); // If a surrogate pair if (0xD800 <= code && code <= 0xDBFF && 0xDC00 <= next && next <= 0xDFFF) { return ((code - 0xD800) * 0x400) + (next - 0xDC00) + 0x10000; } return code; }; } /** * Encodes special html characters * @param string * @return {*} */ function html_encode(s) { var ret_val = ''; for (var i = 0; i < s.length; i++) { if (s.codePointAt(i) > 127) { ret_val += '&#' + s.codePointAt(i) + ';'; } else { ret_val += s.charAt(i); } } return ret_val; } var v = html_encode(document.getElementById('orig').innerHTML); document.getElementById('copy').innerHTML = v; document.getElementById('values').value = v; //console.log(v);
div { padding:10px; border:solid 1px grey; } textarea { width:calc(100% - 30px); height:50px; padding:10px; }
Orig:<div id='orig'>1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂</div> Copy:<div id='copy'></div> Values:<textarea id='values'></textarea>
(A jsfiddle is available at https://jsfiddle.net/Abeeee/k6e4svqa/24/ ) (jsfiddle 可在https://jsfiddle.net/Abeeee/k6e4svqa/24/ 获得)
I've been through the various suggestions on How to convert characters to HTML entities using plain JavaScript , including the he.js which looks the most favourable, but when I downloaded that script it doesn't compile (Unexpected Token around line 32: .. var encodeMap = <%= encodeMap %>;).我已经通过了关于如何使用普通 JavaScript 将字符转换为 HTML 实体的各种建议,包括看起来最有利的 he.js,但是当我下载该脚本时,它无法编译(第 32 行附近的意外令牌:. .var encodeMap = <%= encodeMap %>;)。
I'm not sure where to go with this.我不知道该怎么办。
Javascript strings are UTF-16. Javascript 字符串是 UTF-16。 A character in the surrogate range takes up two 16-bit words.
代理范围内的一个字符占用两个 16 位字。 The
length
property of a string is the count of the number of 16-bit words.字符串的
length
属性是 16 位字的计数。 Thus "🙂".length
will return 2.因此
"🙂".length
将返回 2。
codePointAt(i)
is not the i th character, but the i th 16-bit word. codePointAt(i)
不是第i个字符,而是第i个 16 位字。 Hence, a surrogate character will appear over two consecutive codePointAt
invocations.因此,代理字符将出现在两个连续的
codePointAt
调用中。 From the specs , if "🙂".toString(0)
is the high surrogate, the function will return the code point value, ie 128578, but "🙂".toString(1)
will return only the lower surrogate 56898, that black diamond.从规范来看,如果
"🙂".toString(0)
是高代理,该函数将返回代码点值,即 128578,但"🙂".toString(1)
将仅返回较低代理 56898,即黑色菱形.
Thus you need to skip one position if codePointAt
returns a high surrogate.因此,如果
codePointAt
返回一个高代理,您需要跳过一个位置。
Following the example in the specs, instead of iterating through each 16-bit word in the string, use a method that loops through each character .按照规范中的示例,不是遍历字符串中的每个 16 位字,而是使用循环遍历每个字符的方法。
for let (char in aString) {}
does just that. for let (char in aString) {}
就是这样做的。
function html_encode(s) { var ret_val = ''; for (let char of s) { const code = char.codePointAt(0); if (code > 127) { ret_val += '&#' + code + ';'; } else { ret_val += char; } } return ret_val; } let v = html_encode(document.getElementById('orig').innerHTML); document.getElementById('copy').innerHTML = v; document.getElementById('values').value = v;
div { padding:10px; border:solid 1px grey; } textarea { width:calc(100% - 30px); height:50px; padding:10px; }
Orig:<div id='orig'>1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂</div> Copy:<div id='copy'></div> Values:<textarea id='values'></textarea>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.