简体   繁体   English

mysql在utf8_general_ci中区分大小写

[英]mysql case sensitive in utf8_general_ci

I've a mysql database where i use utf8_general_ci (that is case insensitive), and in my tables i have some columns like ID with case-sensitive data (example: 'iSZ6fX' or 'AscSc2') 我有一个mysql数据库,我使用utf8_general_ci(不区分大小写),在我的表中我有一些像ID这样的列与区分大小写的数据(例如:'iSZ6fX'或'AscSc2')

To distinct uppercase from lowercase is better to set on these columns only the utf8_bin, like this: 要将大写字母与小写字母区分开来,最好只在这些列上设置utf8_bin,如下所示:

CREATE TABLE  `test` (
`id` VARCHAR( 32 ) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL ,
`value1` VARCHAR( 255 ) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL
) ENGINE = MYISAM CHARACTER SET utf8 COLLATE utf8_general_ci

Or use utf8_general_ci on all columns and use 'BINARY' in the php query, for example: 或者在所有列上使用utf8_general_ci并在php查询中使用“BINARY”,例如:

mysqli_query( $link, "SELECT * FROM table WHERE BINARY id = 'iSZ6fX'" );

It is better to use the utf8_bin collation because, even though it is not possible in UTF-8, in the general case it is theoretically possible (such as happens with UTF-16) for the same string to be represented by different encodings, which a binary comparison would not understand but a binary collation would. 最好使用utf8_bin排序规则,因为即使在UTF-8中不可能,在一般情况下,理论上可能(例如UTF-16发生) 同一个字符串由不同的编码表示,二进制比较不会理解,但二进制整理会。 As documented under Unicode Character Sets : Unicode字符集中所述

There is a difference between “ordering by the character's code value” and “ordering by the character's binary representation,” a difference that appears only with utf16_bin , because of surrogates. “按字符的代码值排序”和“按字符的二进制表示排序”之间存在差异,这种区别只出现在utf16_bin ,因为有代理。

Suppose that utf16_bin (the binary collation for utf16 ) was a binary comparison “byte by byte” rather than “character by character.” If that were so, the order of characters in utf16_bin would differ from the order in utf8_bin . 假设utf16_bin (对于二进制排序utf16 )是一个二进制比较“逐字节”而不是“逐字符”。如果是这样的话,在字符的顺序utf16_bin将从在顺序不同utf8_bin For example, the following chart shows two rare characters. 例如,下图显示了两个罕见的字符。 The first character is in the range E000-FFFF , so it is greater than a surrogate but less than a supplementary. 第一个字符在E000-FFFF范围内,因此它大于代理但小于补充。 The second character is a supplementary. 第二个字符是补充。

\nCode point Character utf8 utf16 代码点字符utf8 utf16\n---------- --------- ---- ----- ---------- --------- ---- -----\n0FF9D HALFWIDTH KATAKANA LETTER N EF BE 9D FF 9D 0FF9D HALFWIDTH KATAKANA LETTER N EF BE 9D FF 9D\n10384 UGARITIC LETTER DELTA F0 90 8E 84 D8 00 DF 84 10384 UGARITIC LETTER DELTA F0 90 8E 84 D8 00 DF 84\n

The two characters in the chart are in order by code point value because 0xff9d < 0x10384 . 图表中的两个字符按代码点值排序,因为0xff9d < 0x10384 And they are in order by utf8 value because 0xef < 0xf0 . 并且它们按utf8值排序,因为0xef < 0xf0 But they are not in order by utf16 value, if we use byte-by-byte comparison, because 0xff > 0xd8 . 但是如果我们使用逐字节比较,它们不是按utf16值排序,因为0xff > 0xd8

So MySQL's utf16_bin collation is not “byte by byte.” It is “by code point.” When MySQL sees a supplementary-character encoding in utf16 , it converts to the character's code-point value, and then compares. 所以MySQL的utf16_bin校对不是“逐字节”。它是“按代码点”。当MySQL在utf16看到一个补充字符编码时,它会转换为字符的代码点值,然后进行比较。 Therefore, utf8_bin and utf16_bin are the same ordering. 因此, utf8_binutf16_bin是相同的排序。 This is consistent with the SQL:2008 standard requirement for a UCS_BASIC collation: “UCS_BASIC is a collation in which the ordering is determined entirely by the Unicode scalar values of the characters in the strings being sorted. 这与UCS_BASIC排序规则的SQL:2008标准要求一致:“UCS_BASIC是一种排序规则,其排序完全取决于要排序的字符串中字符的Unicode标量值。 It is applicable to the UCS character repertoire. 它适用于UCS角色曲目。 Since every character repertoire is a subset of the UCS repertoire, the UCS_BASIC collation is potentially applicable to every character set. 由于每个字符集都是UCS指令集的子集,因此UCS_BASIC校对可能适用于每个字符集。 NOTE 11: The Unicode scalar value of a character is its code point treated as an unsigned integer.” 注11:字符的Unicode标量值是其代码点,被视为无符号整数。“

Therefore, if comparisons involving these columns will always be case-sensitive, you should set the column's collation to utf8_bin (so that they will remain case sensitive even if you forget to specify otherwise in your query); 因此,如果涉及这些列的比较始终区分大小写,则应将列的排序utf8_bin设置为utf8_bin (这样即使您忘记在查询中另行指定,它们仍将保持区分大小写); or if only particular queries are case-sensitive, you could specify that the utf8_bin collation should be used using the COLLATE keyword: 或者如果只有特定查询区分大小写,则可以指定使用COLLATE关键字来使用utf8_bin排序COLLATE

SELECT * FROM table WHERE id = 'iSZ6fX' COLLATE utf8_bin

最好使用带有'utf8_bin'的列而不是在查询中指定条件,因为它可以减少出错的可能性。

The effect of BINARY as a column attribute differs from its effect prior to MySQL 4.1. BINARY作为列属性的效果与MySQL 4.1之前的效果不同。 Formerly, BINARY resulted in a column that was treated as a binary string. 以前,BINARY导致一个被视为二进制字符串的列。 A binary string is a string of bytes that has no character set or collation, which differs from a nonbinary character string that has a binary collation. 二进制字符串是一个没有字符集或排序规则的字节字符串,它与具有二进制排序规则的非二进制字符串不同。

But Now 但现在

The BINARY operator casts the string following it to a binary string. BINARY运算符将其后面的字符串转换为二进制字符串。 This is an easy way to force a comparison to be done byte by byte rather than character by character. 这是一种简单的方法,可以逐个字节而不是逐个字符地强制进行比较。 BINARY also causes trailing spaces to be significant. BINARY也会导致尾随空格很重要。 BINARY str is shorthand for CAST(str AS BINARY). BINARY str是CAST的简写(str AS BINARY)。

The BINARY attribute in character column definitions has a different effect. 字符列定义中的BINARY属性具有不同的效果。 A character column defined with the BINARY attribute is assigned the binary collation of the column character set. 使用BINARY属性定义的字符列将分配列字符集的二进制排序规则。 Every character set has a binary collation. 每个字符集都有一个二进制排序规则。 For example, the binary collation for the latin1 character set is latin1_bin, so if the table default character set is latin1, these two column definitions are equivalent: 例如,latin1字符集的二进制排序规则是latin1_bin,因此如果表默认字符集是latin1,则这两个列定义是等效的:

CHAR(10) BINARY

CHAR(10) CHARACTER SET latin1 COLLATE latin1_bin

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM