简体   繁体   English

如何使用纯 Javascript 将 unicode 字符转换为 HTML 数字实体

[英]How to convert unicode characters to HTML numeric entities using plain Javascript

I'm trying to convert innerHTML with special characters into their original &#...;我正在尝试将带有特殊字符的 innerHTML 转换为原始的 &#...; entity values but can't seem to get it working for unicode values.实体值,但似乎无法使其适用于 unicode 值。 Where am I going wrong?我哪里错了?

The code is trying to take "Orig" - encode it and place it into "Copy"....该代码试图采用“Orig”-对其进行编码并将其放入“Copy”中....

Orig: 1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂原文: 1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂

Copy: 1:🙂 __2:𝌆 __3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂 副本:1:🙂 __2:𝌆 __3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:⚠️__11:⚠__12:🙂

... but obviously the dreaded black diamonds are appearing! ......但显然可怕的黑色钻石正在出现!

 if (!String.prototype.codePointAt) { String.prototype.codePointAt = function(pos) { pos = isNaN(pos) ? 0 : pos; var str = String(this), code = str.charCodeAt(pos), next = str.charCodeAt(pos + 1); // If a surrogate pair if (0xD800 <= code && code <= 0xDBFF && 0xDC00 <= next && next <= 0xDFFF) { return ((code - 0xD800) * 0x400) + (next - 0xDC00) + 0x10000; } return code; }; } /** * Encodes special html characters * @param string * @return {*} */ function html_encode(s) { var ret_val = ''; for (var i = 0; i < s.length; i++) { if (s.codePointAt(i) > 127) { ret_val += '&#' + s.codePointAt(i) + ';'; } else { ret_val += s.charAt(i); } } return ret_val; } var v = html_encode(document.getElementById('orig').innerHTML); document.getElementById('copy').innerHTML = v; document.getElementById('values').value = v; //console.log(v);
 div { padding:10px; border:solid 1px grey; } textarea { width:calc(100% - 30px); height:50px; padding:10px; }
 Orig:<div id='orig'>1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:&#9888;&#65039;__11:&#9888;__12:&#128578;</div> Copy:<div id='copy'></div> Values:<textarea id='values'></textarea>

(A jsfiddle is available at https://jsfiddle.net/Abeeee/k6e4svqa/24/ ) (jsfiddle 可在https://jsfiddle.net/Abeeee/k6e4svqa/24/ 获得

I've been through the various suggestions on How to convert characters to HTML entities using plain JavaScript , including the he.js which looks the most favourable, but when I downloaded that script it doesn't compile (Unexpected Token around line 32: .. var encodeMap = <%= encodeMap %>;).我已经通过了关于如何使用普通 JavaScript 将字符转换为 HTML 实体的各种建议,包括看起来最有利的 he.js,但是当我下载该脚本时,它无法编译(第 32 行附近的意外令牌:. .var encodeMap = <%= encodeMap %>;)。

I'm not sure where to go with this.我不知道该怎么办。

Javascript strings are UTF-16. Javascript 字符串是 UTF-16。 A character in the surrogate range takes up two 16-bit words.代理范围内的一个字符占用两个 16 位字。 The length property of a string is the count of the number of 16-bit words.字符串的length属性是 16 位字的计数。 Thus "🙂".length will return 2.因此"🙂".length将返回 2。

codePointAt(i) is not the i th character, but the i th 16-bit word. codePointAt(i)不是i个字符,而是第i个 16 位字。 Hence, a surrogate character will appear over two consecutive codePointAt invocations.因此,代理字符将出现在两个连续的codePointAt调用中。 From the specs , if "🙂".toString(0) is the high surrogate, the function will return the code point value, ie 128578, but "🙂".toString(1) will return only the lower surrogate 56898, that black diamond.规范来看,如果"🙂".toString(0)是高代理,该函数将返回代码点值,即 128578,但"🙂".toString(1)将仅返回较低代理 56898,即黑色菱形.

Thus you need to skip one position if codePointAt returns a high surrogate.因此,如果codePointAt返回一个高代理,您需要跳过一个位置。

Following the example in the specs, instead of iterating through each 16-bit word in the string, use a method that loops through each character .按照规范中的示例,不是遍历字符串中的每个 16 位字,而是使用循环遍历每个字符的方法 for let (char in aString) {} does just that. for let (char in aString) {}就是这样做的。

 function html_encode(s) { var ret_val = ''; for (let char of s) { const code = char.codePointAt(0); if (code > 127) { ret_val += '&#' + code + ';'; } else { ret_val += char; } } return ret_val; } let v = html_encode(document.getElementById('orig').innerHTML); document.getElementById('copy').innerHTML = v; document.getElementById('values').value = v;
 div { padding:10px; border:solid 1px grey; } textarea { width:calc(100% - 30px); height:50px; padding:10px; }
 Orig:<div id='orig'>1:🙂__2:𝌆__3:ß__4:Ü__5:X__6:Y__7:팆__8:Z__9:⚠️__10:&#9888;&#65039;__11:&#9888;__12:&#128578;</div> Copy:<div id='copy'></div> Values:<textarea id='values'></textarea>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用纯 JavaScript 将字符转换为 HTML 实体 - How to convert characters to HTML entities using plain JavaScript 如何在 JavaScript 字符串中使用/解析 HTML 实体和 Unicode 字符 - How to use / parse HTML entities and Unicode characters in a JavaScript string 将十进制 HTML 实体从字符串转换为 Unicode 字符 - Convert decimal HTML entities to unicode characters from string 如何在PHP中将UTF8字符转换为数字字符实体 - How to convert UTF8 characters to numeric character entities in PHP 使用javascript将重音符号编码为html实体 - encode accented characters into html entities using javascript 如何在保留 HTML 标签(Javascript/NodeJS)的同时清理字符并将其转换为 HTML 实体? - How can I sanitize and convert characters to HTML entities while preserving HTML tags (Javascript/NodeJS)? 如何在反应 javascript 中将 unicode 转义序列转换为 unicode 字符 - How to convert unicode escape sequences to unicode characters in react javascript 将PHP或Javascript中的ASCII字符转换为HTML数值代码 - Convert ASCII Characters to HTML Numeric Code in PHP or Javascript 将所有&,&lt;,&gt;字符转换为HTML实体 - Convert all &, <, > characters to HTML entities 如何在JavaScript中获取html实体的数字,十六进制和ISO - How to get numeric, hexadecimal and ISO of html-entities in JavaScript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM