[英]How to use / parse HTML entities and Unicode characters in a JavaScript string
I want to use ‌
我想用
‌
,
,
, °
,
°
and ℃
和
℃
in a JavaScipt string, but this doesn't work:在 JavaScipt 字符串中,但这不起作用:
const str = `‌ ° ℃`;
If I do console.log(str)
, I would expect to see something like this (note the ‌
would not be visible and the
would just look like a regular space):如果我做
console.log(str)
,我希望看到这样的东西(注意‌
将不可见,而
看起来像一个普通的空间):
° ℃
I've seen this other question where the suggested solution is to change these entities to their hexadecimal equivalent, but that's not possible as this string comes from the backend with the entities already in place.我已经看到了另一个问题,其中建议的解决方案是将这些实体更改为它们的十六进制等效项,但这是不可能的,因为该字符串来自后端,实体已经就位。
Even if the HTML entities are already in that string, one way or another, you need to replace them with their actual character or their escape notation equivalent.即使 HTML 实体已经在该字符串中,以一种或另一种方式,您也需要将它们替换为它们的实际字符或等效的转义符号。
If they were not in the string already, one option would be to just look them up:如果它们不在字符串中,一种选择是查找它们:
‌
- ZERO WIDTH NON-JOINER
: 0x200C
. ZERO WIDTH NON-JOINER
: 0x200C
。
- NO-BREAK SPACE
: 0x00A0
. NO-BREAK SPACE
0x00A0
NO-BREAK SPACE
: 0x00A0
。 Or calculate them:或者计算它们:
°
- DEGREE SIGN (°)
: 0x00B0
( 176
in decimal is b0
in hexadecimal). DEGREE SIGN (°)
: 0x00B0
(十进制的176
是十六进制的b0
)。℃
- DEGREE CELSIUS (℃)
: 0x2103
( 8451
in decimal is 2103
in hexadecimal). DEGREE CELSIUS (℃)
: 0x2103
(十进制的8451
是十六进制的2103
)。 Or, if you can type or copy-paste the original character from somewhere else, you can get its decimal Unicode code using String.prototype.charCodeAt()
, which returns the UTF-16 decimal code unit at the given index, and Number.prototype.toString()
, using its radix
parameter to convert that decimal to hexadecimal:或者,如果您可以从其他地方键入或复制粘贴原始字符,则可以使用
String.prototype.charCodeAt()
获取其十进制 Unicode 代码,它返回给定索引处的 UTF-16 十进制代码单元和Number.prototype.toString()
,使用其radix
参数将该十进制转换为十六进制:
'°'.charCodeAt(0); // 176
'°'.charCodeAt(0).toString(16); // "b0"
And then use the escape notation to represent them with their Unicode code.然后使用转义符号用它们的 Unicode 代码来表示它们。 Note that depending on the code, we use the
\\uXXXX
or the \\xXX
notation:请注意,根据代码,我们使用
\\uXXXX
或\\xXX
表示法:
const str = `\ \\xA0 \\xB0 \℃`; console.log(str); console.log(str.split(' ').map(s => `${ s.charCodeAt(0) } = ${ s.charCodeAt(0).toString(16) }`));
In your case, you need to parse that string, extract the entities and replace them with the actual character they represent.在您的情况下,您需要解析该字符串,提取实体并将其替换为它们代表的实际字符。
I've made this snippet so that you can just paste characters or write HTML entities and get their Unicode codes, but this will also serve you as an example on how to dynamically parse those HTML entities:我制作了这个片段,以便您可以粘贴字符或编写 HTML 实体并获取它们的 Unicode 代码,但这也将作为如何动态解析这些 HTML 实体的示例:
const sandbox = document.getElementById('sandbox'); const input = document.getElementById('input'); const list = document.getElementById('list'); function parseInput() { let text = input.value; (text.match(/&.+;/ig) || []).forEach(entity => { // Insert the HTML entity as HTML in an HTML element: sandbox.innerHTML = entity; // Retrieve the HTML elements innerText to get the parsed entity (the actual character): text = text.replace(entity, sandbox.innerText); }); list.innerHTML = text.split('').map(char => { const dec = char.charCodeAt(0); const hex = dec.toString(16).toUpperCase(); const code = hex.length === 2 ? `\\\\x${ hex }` : `\\\\u${ hex }`; const link = `0000${ code }`.slice(-Math.min(4, hex.length )); return ` <li> <div>${ char }</div> <div>${ dec }</div> <div>${ hex }</div> <div><a href="http://www.fileformat.info/info/unicode/char/${ link }">${ code }</a></div> </li> `; }).join(''); } input.value = '‌ °℃'; input.oninput = parseInput; parseInput();
body { margin: 0; padding: 8px; font-family: monospace; } #input { margin-bottom: 16px; border-radius: 2px; border: 0; padding: 8px; font-family: monospace; font-size: 16px; font-weight: bold; box-shadow: 0 0 32px rgba(0, 0, 0, .25); width: 100%; box-sizing: border-box; height: 40px; outline: none; } #sandbox { display: none; } #list { list-style: none; margin: 0; padding: 0; border-top: 1px solid #EEE; } #list > li { display: flex; border-bottom: 1px solid #EEE; } #list > li > div { width: 25%; box-sizing: border-box; padding: 8px; } #list > li > div + div { border-left: 1px solid #EEE; }
<div id="sandbox"></div> <input type="text" id="input" /> <ul id="list"></ul>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.