简体   繁体   English

重音字符的javascript编码问题

[英]javascript encoding issue with accented characters

I have a page with UTF-8 header: 我有一个带有UTF-8标头的页面:

<meta charset="utf-8" />

And in the page I use the umbraco dictionary to fetch content in various languages. 在页面中,我使用umbraco词典来获取各种语言的内容。 When I print this in German on the page it appears fine: 当我在页面上用德语打印时,它看起来很好:

<h1>@library.GetDictionaryItem("A")</h1>

resolves to: 解析为:

<h1>Ä</h1> in German <h1>Ä</h1>德语

However if I enter it via a script: 但是,如果我通过脚本输入它:

<script type="text/javascript" charset="utf-8">
  var a = "@library.GetDictionaryItem("A")";
  alert(a);
</script>

The alert prints: 警报打印:

&#228;

If I do 如果我做

<script type="text/javascript" charset="utf-8">
  var a = "Ä";
  alert(a);
</script>

The alert prints: 警报打印:

Ä

So what could explain this behaviour and how can I fix the alert? 那么,什么可以解释这种现象,以及如何解决警报? As far as I can see everything is UTF-8 and the dictionary and the page encoding is fine. 据我所知,一切都是UTF-8,并且字典和页面编码都很好。 The problem happens within Javascript. 问题发生在Javascript中。

From what I can see from the table here, Javascript resolves the character into it's Numeric value. 从我在此处的表格中可以看到,Javascript将字符解析为数字值。 I used "escape, encodeUrl, decodeUrl" etc with no luck. 我用“转义,encodeUrl,decodeUrl”等没有运气。

chr  HexCode  Numeric   HTML entity     escape(chr)  encodeURI(chr) 

ä    \xE4     &#228;    &auml;          %E4          %C3%A4 

(FWIW: Character entity &#228; is ä , not Ä .) (FWIW:字符实体&#228;ä ,而不是Ä 。)

This has nothing to do with character encoding. 这与字符编码无关。 You're outputting an HTML entity to a JavaScript string, and then asking the browser to display that JavaScript string without doing anything to interpret HTML (via alert ). 您正在将HTML 实体输出到JavaScript字符串,然后要求浏览器显示该JavaScript字符串,而不执行任何解释HTML的操作(通过alert )。 It's exactly as though you actually typed: 就像您实际键入的一样:

<h1>&#228;</h1>

...(which will show ä on the page), and ...(将在页面上显示ä ),以及

<script>
var a = "&#228;";
alert(a);
</script>

...which won't. ...不会。 The HTML entity isn't being used anywhere that understands HTML entities. HTML实体不会在任何了解HTML实体的地方使用。 alert doesn't interpret HTML. alert无法解释HTML。

But if you did this: 但是,如果您这样做:

<script>
var a = "&#228;";
var div = document.createElement('div');
div.innerHTML = a;
document.body.appendChild(div);
</script>

...you'd see the character on the page, because we're giving the entity to something ( innerHTML ) that will interpret HTML. ...您会在页面上看到该字符,因为我们为实体提供了将解释HTML的内容( innerHTML )。 And so if you make that first line: 因此,如果您输入第一行:

var a = "@library.GetDictionaryItem("A")";

...and then use a in an HTML context (as above), you'll get the ä in the document. ...然后在HTML上下文中使用a (如上所述),您将在文档中获得ä

If you always get a decimal numeric character entity (like &#228; ) from Umbraco, since those define unicode code points and JavaScript (mostly) uses unicode code points in its strings*, you can parse the entity easily enough: 如果您总是从Umbraco获得十进制数字字符实体(例如&#228; ),由于它们定义了unicode代码点,而JavaScript(通常)在其字符串中使用unicode代码点*,则可以轻松地解析该实体:

function characterFromDecimalNumericEntity(str) {
    var decNumEntRex = /^\&#(\d+);$/;
    var match = decNumEntRex.exec(str);
    var codepoint = match ? parseInt(match[1], 10) : null;
    var character = codepoint ? String.fromCharCode(codepoint) : null;
    return character;
}
alert(characterFromDecimalNumericEntity("&#228;")); // ä

Live Example 现场例子

* Why "mostly": JavaScript strings are made up of 16-bit "characters" that correspond to UTF-16 code units , not Unicode code points (you can't store a Unicode code point in 16 bits, you need 21). *为什么要“主要”:JavaScript字符串由对应于UTF-16 代码单元而不是Unicode代码点的16位“字符”组成(您不能以16位存储Unicode代码点,需要21)。 All characters from the Basic Multilingual Plane fit within one UTF-16 code unit, but characters from the Supplementary Multilingual Plane , Supplementary Ideographic Plane , and so on require two UTF-16 code units for a character. 基本多语言平面中的所有字符都适合一个UTF-16代码单元,但是补充多语言平面补充表意文字平面 中的字符需要一个字符使用两个 UTF-16代码单元。 One of those characters will occupy two "characters" in a JavaScript string. 这些字符之一将占据JavaScript字符串中的两个“字符”。 The function above would fail for them. 上面的功能对他们来说将失败。 More in the JavaScript spec and the Unicode FAQ . 有关JavaScript规范Unicode FAQ的更多信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM