简体   繁体   中英

MySQL: Collate in query - any side effects?

My OpenCart table collation is utf8_bin , unfortunately I can't search for product names with accent in their name. I searched on Google and just found that the collation must be utf8_general_ci for accent compatible and case insensitive search.

What If I add collate declaration to the search query?

SELECT * 
FROM  `address` 
COLLATE utf8_general_ci
LIMIT 0 , 30

Does it have any (bad) side effect? I red about problems with indexing, performance? Or it is totally safe?

I'm afraid you have to consider the side effects on query performance, especially those using indexes. Here is a simple test:

mysql> create table aaa (a1 varchar(100) collate latin1_general_ci, tot int);
insert into aaa values('test1',3) , ('test2',4), ('test5',5);

mysql> create index aindex on aaa (a1);
Query OK, 0 rows affected (0.59 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc aaa;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| a1    | varchar(100) | YES  | MUL | NULL    |       |
| tot   | int(11)      | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
2 rows in set (0.53 sec)


mysql> explain select * from aaa where a1='test1' ;
+----+-------------+-------+------+---------------+--------+---------+-------+--
----+-----------------------+
| id | select_type | table | type | possible_keys | key    | key_len | ref   | r
ows | Extra                 |
+----+-------------+-------+------+---------------+--------+---------+-------+--
----+-----------------------+
|  1 | SIMPLE      | aaa   | ref  | aindex        | aindex | 103     | const |
  1 | Using index condition |
+----+-------------+-------+------+---------------+--------+---------+-------+--
----+-----------------------+
1 row in set (0.13 sec)

mysql> explain select * from aaa where a1='test1' collate utf8_general_ci;
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows
 | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-------------+
|  1 | SIMPLE      | aaa   | ALL  | NULL          | NULL | NULL    | NULL |    3
 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+-----
-+-------------+
1 row in set (0.06 sec)

You can see that MySQL is stopping using the index on a1 when you search it using another collation, which can be a huge problem for you.

To make sure your indexes are being used for queries, you may have to change your column collation to the most frequently used one.

In using of COLLATE in SQL statements , I don't find that usage, Anyway for explaining about your main question of effects of using collations I found some tips, but at first:

From dev.mysql.com :

Nonbinary strings (as stored in the CHAR , VARCHAR , and TEXT data types) have a character set and collation. A given character set can have several collations, each of which defines a particular sorting and comparison order for the characters in the set.

  1. Collation is merely the ordering that is used for string comparisons—it has (almost) nothing to do with the character encoding that is used for data storage. I say almost because collations can only be used with certain character sets, so changing collation may force a change in the character encoding.
    To the extent that the character encoding is modified, MySQL will correctly re-encode values to the new character set whether going from single to multi-byte or vice-versa. Beware that any values that become too large for the column will be truncated. [1]
  2. The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful. [2]
  3. With multiple operands, there can be ambiguity. For example:

     SELECT x FROM T WHERE x = 'Y'; 

    Should the comparison use the collation of the column x , or of the string literal 'Y' ? Both x and 'Y' have collations, so which collation takes precedence?
    Standard SQL resolves such questions using what used to be called “coercibility” rules. [3]

  4. If you change the collation of a field, ORDER BY -[also in WHERE ]- cannot use any INDEX ; hence it could be surprisingly inefficient. [4]
  5. Since the forced collation is defined over the same character set as the column's encoding, there won't be any performance impact(versus defining that collation as the column's default; whereas utf8_general_ci will almost certainly perform slower in comparisons than utf8_bin due the extra lookups/computation required).
    However, if one forced a collation that is defined over a different character set, MySQL would have to transcode the column's values (which would have a performance impact). [5]

If practical, change the column definition(s).

ALTER TABLE tbl
    MODIFY col VARCHAR(...) COLLATE utf8_general_ci ...;

(You should include anything else that was already in the column definition.) If you have multiple columns to modify, do them all in the same ALTER (for speed).

If, for some reason, you cannot do the ALTER , then, yes, you can tweak the SELECT to change the collation:

The SELECT you mentioned had no WHERE clause for filtering, so let me change the test case:

Let's say you have this, which will find only 'San Jose':

SELECT *
    FROM tbl
    WHERE city = 'San Jose'

To include San José :

SELECT *
    FROM tbl
    WHERE city COLLATE utf8_general_ci = 'San Jose'

If you might have "combining accents", consider using utf8_unicode_ci. More on Combining Diacriticals and More on your topic .

As for side effects? None except for on potentially big one: The index on the column cannot be used. In my second SELECT (above), INDEX(city) is useless. The ALTER avoids this performance penalty on the SELECT , but the one-time ALTER , itself, is costly.

This might help: UTF-8: General? Bin? Unicode? Please note that utf8_bin is also case sensitive. So I would go for altering table collation to utf8_general_ci and have peace of mind for the future.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM