简体   繁体   English

用ANSI编码html和没有BOM的UTF-8

[英]Encoding html in ANSI vs UTF-8 w/o BOM

What's the difference between writing a eg php or html document in Ansi and UTF-8 without BOM and then uploading them on a webserver? AnsiUTF-8中编写一个例如php或html文档而没有BOM ,然后在网络服务器上上传它们有什么区别? Both document has meta UTF-8 in . 这两个文件都有元UTF-8。

If someone writes simply with notepad, they have to choose Ansi, because notepad doesn't offer UTF-8 without Byde-Order-Mark 如果有人用记事本简单地写,他们必须选择Ansi,因为记事本不提供没有Byde-Order-Mark的UTF-8

The difference is that if you write your file in some 8-bit codepage and then forget to convert it to UTF-8, people might see your web page broken, because you set the charset is set to UTF-8 in meta; 不同之处在于,如果您在一些8位代码页中编写文件然后忘记将其转换为UTF-8,那么人们可能会看到您的网页被破坏,因为您将字符集设置为元数据中的UTF-8; and to apply that bug fix in hurry, you cannot access the file in place using SFTP or WinSCP, because you'd have to convert into 8-bit codepage first again. 并且要快速应用该错误修复,您无法使用SFTP或WinSCP访问该文件,因为您必须再次转换为8位代码页。

Furthermore UTF-8 is Unicode, and the full range of characters is supported, while in "ANSI" codepages then no. 此外,UTF-8是Unicode,支持所有字符,而在“ANSI”代码页中则不支持。 Not all Unicode documents can be converted back to "ANSI" codepages, and thus you could not edit them this way. 并非所有Unicode文档都可以转换回“ANSI”代码页,因此您无法以这种方式编辑它们。

No sane person uses Windows Notepad for serious coding because its lack of functionality, syntax coloring, line ending formats and because of its awful support for character sets. 没有理智的人使用Windows记事本进行严肃的编码,因为它缺乏功能,语法着色,行结尾格式以及它对字符集的可怕支持。

The difference is that UTF-8 and “ANSI” (a Microsoft misnomer for various 8-bit encodings) are completely different encodings, though they coincide for the ASCII code range, 0x00 to 0x7F. 区别在于UTF-8和“ANSI”(微软用于各种8位编码的误称)是完全不同的编码,尽管它们与ASCII码范围0x00到0x7F重合。

It is incorrect to label an “ANSI” file as UTF-8 encoded. 将“ANSI”文件标记为UTF-8编码是不正确的。 The error does not cause observable effects if the data actually contains ASCII characters only or, in most cases, if the file is sent with HTTP headers which specify the correct encoding. 如果数据实际上仅包含ASCII字符,或者在大多数情况下,如果文件是使用指定正确编码的HTTP标头发送的,则错误不会导致可观察到的影响。

There is no reason not to use BOM for UTF-8 encoded HTML files. 没有理由不将BOM用于UTF-8编码的HTML文件。 Pages that claim otherwise are based either on information about browsers that lost all practical impact years ago or on confusing HTML with PHP. 另外声称的页面基于几年前失去所有实际影响的浏览器的信息,或者基于PHP与PHP混淆的信息。 In a PHP file, BOM may cause problems, because PHP software does not handle BOM correctly, ie does not remove it when inserting the content of a file in another. 在PHP文件中,BOM可能会导致问题,因为PHP软件无法正确处理BOM,即在将文件内容插入另一个文件时不会将其删除。

Notepad is indeed unable to save the file as UTF-8 without BOM. 记事本确实无法将文件保存为没有BOM的UTF-8。 Therefore, when creating or editing PHP files, use other programs, such as Notepad++ . 因此,在创建或编辑PHP文件时,请使用其他程序,例如Notepad ++ If you have to use Notepad, you just need to adapt to the limitations: use “ANSI” (after finding out what it is in your environment – it could be windows-1252, or something else), declare it in HTTP headers and meta tags, and use character references to represent characters that cannot be represented in “ANSI”. 如果你必须使用记事本,你只需要适应这些限制:使用“ANSI”(在找到你的环境中的内容之后 - 它可能是windows-1252,或其他东西),在HTTP头和meta声明它标签,并使用字符引用来表示无法用“ANSI”表示的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM