How do I correctly insert unicode in an HTML title using JavaScript?

Question

I'm seeing some weird behavior when I'm setting the title of an HTML page using JavaScript. If I insert html character references directly into the title the Unicode renders correctly, for instance:

<title>&#21543;&#20986;</title>

But if I attempt to use html characters references via JavaScript, something seems to be converting the & to (& amp ;) (separating them so SO doesn't just turn it back into ampersand) and thus breaking the encoding, causing it to be rendered as the full coded string:

function execTitleChange() {
  document.title = "&#21543;&#20986;";
}

(I should note that this is a little bit of speculation; when I introspect the DOM using Firebug after executing this JavaScript function, that's where I see the & instead of &.)

If I use \\u encoded Unicode characters when setting the value from JavaScript then everything works correctly again:

function execTitleChange() {
  document.title = "\u5427\u51fa";
}

The fact that \\u encoded characters work kind of makes sense to me since I think that's how JavaScript represents Unicode characters but I'm stumped as to why the behavior would be different when using the html character references.

Answer 1

JavaScript string constants are parsed by the JavaScript parser. Text inside HTML tags is parsed by the HTML parser. The two languages (and, by extension, their parsers) are different, and in particular they have different ways of representing characters by character code.

Thus, what you've discovered is the way reality actually is :-) Use the \\u\u003c/code> escape notation in JavaScript, and use HTML entities ( &#nnnn; ) in HTML/XML.

edit — now the situation can get even more confusing when you're talking about creating/inserting HTML from JavaScript. When you use .innerHTML to update the DOM from JavaScript, then you are basically handing over HTML source code to the HTML parser for interpretation. For that reason, you can use either JavaScript \\u\u003c/code> escapes or HTML entities, and things will work (excepting painful issues of character encoding mismatches etc).

Finally, note that JavaScript also provides the String.fromCharCode() function to construct strings from numeric character codes.

Answer 2

The best way to work with Unicode characters in JavaScript is to use the characters themselves, using an editor or other tool that can store them in UTF-8 encoding. You will avoid a lot of confusion. Naturally, you need to properly declare the character encoding of your .js or .html file.

The construct 吧 has no special meaning in JavaScript; it is just eight Ascii characters. But if your JavaScript code has been embedded into an HTML document, then it will be processed by HTML rules before passing to the JavaScript interpreter. And the rules vary by HTML version. Yet another reason to avoid such constructs.

So just write

document.title = "吧出";

(Of course, there are very few situations where you should change the title element content—which is crucial to search engines and many other purposes—in JavaScript, instead of setting it in HTML. But that's beside the point.)

How do I correctly insert unicode in an HTML title using JavaScript?

Question

2 answers

solution1
21 ACCPTED 2012-08-24 18:05:25

solution2
5 2012-08-24 19:32:52

How do I correctly insert unicode in an HTML title using JavaScript?

Question

2 answers

solution1 21 ACCPTED 2012-08-24 18:05:25

solution2 5 2012-08-24 19:32:52

solution1
21 ACCPTED 2012-08-24 18:05:25

solution2
5 2012-08-24 19:32:52