简体   繁体   English

如何在 JavaScript 字符串中使用/解析 HTML 实体和 Unicode 字符

[英]How to use / parse HTML entities and Unicode characters in a JavaScript string

I want to use ‌我想用‌ ,   ,   , ° , ° and ℃℃ in a JavaScipt string, but this doesn't work:在 JavaScipt 字符串中,但这不起作用:

const str = `‌   ° ℃`;

If I do console.log(str) , I would expect to see something like this (note the ‌ would not be visible and the   would just look like a regular space):如果我做console.log(str) ,我希望看到这样的东西(注意‌将不可见,而 看起来像一个普通的空间):

   ° ℃

I've seen this other question where the suggested solution is to change these entities to their hexadecimal equivalent, but that's not possible as this string comes from the backend with the entities already in place.我已经看到了另一个问题,其中建议的解决方案是将这些实体更改为它们的十六进制等效项,但这是不可能的,因为该字符串来自后端,实体已经就位。

Even if the HTML entities are already in that string, one way or another, you need to replace them with their actual character or their escape notation equivalent.即使 HTML 实体已经在该字符串中,以一种或另一种方式,您也需要将它们替换为它们的实际字符或等效的转义符号

If they were not in the string already, one option would be to just look them up:如果它们不在字符串中,一种选择是查找它们:

Or calculate them:或者计算它们:

Or, if you can type or copy-paste the original character from somewhere else, you can get its decimal Unicode code using String.prototype.charCodeAt() , which returns the UTF-16 decimal code unit at the given index, and Number.prototype.toString() , using its radix parameter to convert that decimal to hexadecimal:或者,如果您可以从其他地方键入或复制粘贴原始字符,则可以使用String.prototype.charCodeAt()获取其十进制 Unicode 代码,它返回给定索引处的 UTF-16 十进制代码单元和Number.prototype.toString() ,使用其radix参数将该十进制转换为十六进制:

'°'.charCodeAt(0); // 176
'°'.charCodeAt(0).toString(16); // "b0"

And then use the escape notation to represent them with their Unicode code.然后使用转义符号用它们的 Unicode 代码来表示它们。 Note that depending on the code, we use the \\uXXXX or the \\xXX notation:请注意,根据代码,我们使用\\uXXXX\\xXX表示法:

 const str = `\‌ \\xA0 \\xB0 \℃`; console.log(str); console.log(str.split(' ').map(s => `${ s.charCodeAt(0) } = ${ s.charCodeAt(0).toString(16) }`));

In your case, you need to parse that string, extract the entities and replace them with the actual character they represent.在您的情况下,您需要解析该字符串,提取实体并将其替换为它们代表的实际字符。

I've made this snippet so that you can just paste characters or write HTML entities and get their Unicode codes, but this will also serve you as an example on how to dynamically parse those HTML entities:我制作了这个片段,以便您可以粘贴字符或编写 HTML 实体并获取它们的 Unicode 代码,但这也将作为如何动态解析这些 HTML 实体的示例:

 const sandbox = document.getElementById('sandbox'); const input = document.getElementById('input'); const list = document.getElementById('list'); function parseInput() { let text = input.value; (text.match(/&.+;/ig) || []).forEach(entity => { // Insert the HTML entity as HTML in an HTML element: sandbox.innerHTML = entity; // Retrieve the HTML elements innerText to get the parsed entity (the actual character): text = text.replace(entity, sandbox.innerText); }); list.innerHTML = text.split('').map(char => { const dec = char.charCodeAt(0); const hex = dec.toString(16).toUpperCase(); const code = hex.length === 2 ? `\\\\x${ hex }` : `\\\\u${ hex }`; const link = `0000${ code }`.slice(-Math.min(4, hex.length )); return ` <li> <div>${ char }</div> <div>${ dec }</div> <div>${ hex }</div> <div><a href="http://www.fileformat.info/info/unicode/char/${ link }">${ code }</a></div> </li> `; }).join(''); } input.value = '&zwnj;&nbsp;°℃'; input.oninput = parseInput; parseInput();
 body { margin: 0; padding: 8px; font-family: monospace; } #input { margin-bottom: 16px; border-radius: 2px; border: 0; padding: 8px; font-family: monospace; font-size: 16px; font-weight: bold; box-shadow: 0 0 32px rgba(0, 0, 0, .25); width: 100%; box-sizing: border-box; height: 40px; outline: none; } #sandbox { display: none; } #list { list-style: none; margin: 0; padding: 0; border-top: 1px solid #EEE; } #list > li { display: flex; border-bottom: 1px solid #EEE; } #list > li > div { width: 25%; box-sizing: border-box; padding: 8px; } #list > li > div + div { border-left: 1px solid #EEE; }
 <div id="sandbox"></div> <input type="text" id="input" /> <ul id="list"></ul>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用纯 Javascript 将 unicode 字符转换为 HTML 数字实体 - How to convert unicode characters to HTML numeric entities using plain Javascript 将十进制 HTML 实体从字符串转换为 Unicode 字符 - Convert decimal HTML entities to unicode characters from string 如何将 Unicode 字符串拆分为 JavaScript 中的字符 - How to split Unicode string to characters in JavaScript 使用JavaScript正则表达式将数字HTML实体替换为其实际字符 - Use JavaScript regex to replace numerical HTML entities with their actual characters 以呈现的HTML形式显示特殊字符,HTML实体和unicode - Displaying special characters, HTML entities, unicode as is in rendered HTML 如何防止 Unicode 字符在 JavaScript 中呈现为 HTML 中的表情符号? - How to prevent Unicode characters from rendering as emoji in HTML from JavaScript? CKEDITOR getData()返回html字符实体(unicode),但是如何获得未翻译的字符? - CKEDITOR getData() returns html character entities (unicode), but how does one get untranslated characters? 如何阻止FCKeditor将html实体恢复为其unicode字符 - How do I stop FCKeditor reverting html entities back to their unicode characters 如何使用纯 JavaScript 将字符转换为 HTML 实体 - How to convert characters to HTML entities using plain JavaScript 如何在 JavaScript 中使用五位长的 Unicode 字符 - How to use five digit long Unicode characters in JavaScript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM