简体   繁体   中英

Should I encode HTML special characters when using UTF-8?

I heard recently that it is no longer required to encode HTML special characters when using the UTF-8 (or Unicode) charset on a web site.

I mean non-ASCII characters, such as « («), — (—), and similar. Characters reserved in HTML should be escaped of course (>, “, and so on).

If is it true, preparing large texts for publishing on the web would be much easier than before.

It has never been required to “encode” characters (escape them with character references like &#8212; or entity references like &mdash; ) when using UTF-8, during the time that browsers have supported UTF-8 in the first place. The only exceptions are the less-than character “<” and the ampersand “&”, which need to be escaped independently of encoding. (Well, a quotation mark cannot be used as such inside a quoted attribute value that has the same mark as delimiter, but this can usually be avoided.)

You may still use escape notations. You might do so if you expect that you, or someone else, will have to edit the HTML document so that the authoring tools do not have appropriate UTF-8 support. You might also do that because you are typing in text and have no handy tool for inserting all characters as such. But these are exceptions.

It's true.

Using HTML Entities (except for special chars) has been quite out of fashion since UTF-8 took over.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM