简体   繁体   English

字符集特殊字符

[英]Character Set Special Characters

  • Is iso-8859-1 a proper subset of utf-8? iso-8859-1 是 utf-8 的真子集吗?
  • What about iso-8859-n? iso-8859-n 呢?
  • What about windows-1252? Windows-1252 怎么样?

If the answer is no to any of the above, what are the disjoint characters?如果以上任何一项的答案是否定的,不相交的字符是什么? I'm testing some logic that detects charsets and want to write tests to verify the detection is working properly.我正在测试一些检测字符集的逻辑,并想编写测试来验证检测是否正常工作。

Is iso-8859-1 a proper subset of utf-8? iso-8859-1 是 utf-8 的真子集吗?

The character reportoire of ISO-8859-1 (the first 256 characters of Unicode) is a proper subset of that of UTF-8 (every Unicode character). ISO-8859-1 的字符报告(Unicode 的前 256 个字符)是 UTF-8(每个 Unicode 个字符)的真子集。

However, the characters U+0080 to U+00FF are encoded differently in the two encodings.但是,字符 U+0080 到 U+00FF在两种编码中的编码方式不同。

  • ISO-8859-1 assigns each of these characters a single byte from 80 to FF . ISO-8859-1 为这些字符中的每一个分配一个从80FF单个字节
  • UTF-8 encodes the same characters as two-byte sequences C2 80 to C3 BF . UTF-8 编码的字符与两字节序列C2 80C3 BF相同。

What about iso-8859-n? iso-8859-n 呢?

These are 15 different encodings that contain a total of 614 distinct characters.这些是 15 种不同的编码,总共包含 614 个不同的字符。 Some of these characters occur in multiple "parts" of ISO 8859, and some don't.其中一些字符出现在 ISO 8859 的多个“部分”中,有些则没有。 You'll have to be more specific.你必须更具体。

I see that your question is tagged ISO-8859-2.我看到您的问题被标记为 ISO-8859-2。 The characters that are in -2 that aren't in -1 are: -2 中不在 -1 中的字符是:

Ă㥹ĆćČčĎďĐđĘęĚěĹ弾ŁłŃńŇňŐőŔŕŘřŚśŞşŠšŢţŤťŮůŰűŹźŻżŽžˇ˘˙˛˝

What about windows-1252? Windows-1252 怎么样?

Windows-1252 is just like ISO-8859-1 except that it replaces the rarely used control characters in the 0x80-0x9F range with printable characters. Windows-1252 与 ISO-8859-1 类似,只是它将 0x80-0x9F 范围内很少使用的控制字符替换为可打印字符。 The characters that are in windows-1252 but not in ISO-8859-1 are:在 windows-1252 中但不在 ISO-8859-1 中的字符是:

ŒœŠšŸŽžƒˆ˜–—''‚“”„†‡•…‰‹›€™

Unicode is a superset of all these character sets, and of pretty much all established character sets out there. Unicode 是所有这些字符集的超集,也是几乎所有现有字符集的超集。 You can find a list of mappings of all these character sets to Unicode code points here: http://unicode.org/Public/MAPPINGS/ .您可以在此处找到所有这些字符集到 Unicode 代码点的映射列表: http://unicode.org/Public/MAPPINGS/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 PHP URLDecode / UTF8_Encode字符集特殊字符问题 - PHP URLDecode / UTF8_Encode Character Set Issues with special characters 如何从zend php中的utf8字符集中验证特殊字符 - how to validate special characters from utf8 character set in zend php php文件字​​符编码,mysql数据库字符编码,特殊字符 - php file character encoding, mysql database character encoding, special characters 导致查询失败的特殊字符(即“ ...”字符) - Special characters (i.e. '…' character) causing query to fail 将JSON中的特殊字符转换为CSV时的字符编码(Excel) - character encoding when transforming special characters in JSON into CSV (excel) 设置为charset = UTF8后,特殊字符未显示在网页上 - Special characters not showing on the web page after set to charset=UTF8 Oracle 10g Clob列中的特殊字符(UTF8字符)返回为? 或在Java中为空白 - Special Characters (UTF8 character) in Oracle 10g Clob Column returning as ? or blank in Java “VARCHAR(255)CHARACTER SET utf8”是255个字节还是255个字符 - Is “VARCHAR(255) CHARACTER SET utf8” 255 bytes or 255 characters 存储的非英文字符,得到“ ?????” -MySQL字符集问题 - Stored non-English characters, got '?????' - MySQL Character Set issue 字符转换为特殊字符 - Characters get converted into special characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM