简体   繁体   English

MySQL 如何在 UTF-8 中工作“不区分大小写”和“不区分重音”

[英]How to MySQL work “case insensitive” and “accent insensitive” in UTF-8

I have a schema in "utf8 -- UTF-8 Unicode" as charset and a collation of "utf8_spanish_ci".我在“utf8 -- UTF-8 Unicode”中有一个模式作为字符集和一个“utf8_spanish_ci”的排序规则。

All the inside tables are InnoDB with same charset and collation as mentioned.所有内部表都是 InnoDB,具有与上述相同的字符集和排序规则。

Here comes the problem:问题来了:

with a query like像这样的查询

SELECT *
FROM people p
WHERE p.NAME LIKE '%jose%';

I get 83 result rows.我得到 83 个结果行。 I should have 84 results, because I know it.我应该有 84 个结果,因为我知道。

Changing where for:更改地点:

WHERE p.NAME LIKE '%JOSE%';

I get the exact same 83 rows.我得到完全相同的 83 行。 With combinations like JoSe, Jose, JOSe, etc. All the same 83 rows are reported.使用 JoSe、Jose、JOSe 等组合。报告所有相同的 83 行。

The problem comes when accents play in game.当口音在游戏中发挥作用时,问题就出现了。 If do:如果这样做:

WHERE p.NAME LIKE '%josé%';

I get no results.我没有结果。 0 rows. 0 行。

But if I do:但如果我这样做:

WHERE p.NAME LIKE '%JOSÉ%';

I get just one resulting row, so 1 row.我只得到一个结果行,所以 1 行。 This is the only row which has accented "jose" and capitalized.这是唯一重音“jose”并大写的行。

I've tried with josÉ, or JoSÉ or whatever combination I do, as long as the accented letter stays capitalized or not, as it really is stored in the database and it stills returning the only row.我已经尝试过使用 josÉ 或 JoSÉ 或我所做的任何组合,只要重音字母保持大写,因为它确实存储在数据库中并且仍然返回唯一的行。 If I suddenly change "É" for "é" in whatever combination I do with the capitalization in JOSE, it returns no rows.如果我突然将“É”更改为“é”,无论我用 JOSE 中的大写字母做什么组合,它都不会返回任何行。

So conclusions:所以结论:

  • Case insensitive if no latin characters plays in game.如果游戏中没有拉丁字符,则不区分大小写。
  • Case sensitive if latin characters appears.如果出现拉丁字符,则区分大小写。
  • Accent sensitive, as if I search JOSE or jose, I only get 83 rows, instead of the 84 rows I need.口音敏感,就像我搜索 JOSE 或 jose 一样,我只得到 83 行,而不是我需要的 84 行。

What I want?我想要的是?

  • To search "jose", "JOSE", "José", "JOSÉ", "JÒSE", "jöse", "JoSÈ", ... have to return the 84 rows I know that exists.要搜索“jose”、“JOSE”、“José”、“JOSÉ”、“JÒSE”、“jöse”、“JoSÈ”……必须返回我知道存在的 84 行。 I what to turn my searches to case insensitive and "latin" insensitive.我如何将我的搜索转换为不区分大小写和“拉丁”不敏感。

Solutions like COLLATION on LIKE doesn't work for me, don't know why...COLLATION on LIKE这样的解决方案对我不起作用,不知道为什么......

What can I do?我能做什么?

EDIT:编辑:

If I do something like:如果我做这样的事情:

WHERE p.NAME LIKE '%jose%' COLLATE utf8_general_ci;

I get the error:我收到错误:

COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1' COLLATION 'utf8_general_ci' 对 CHARACTER SET 'latin1' 无效

And I've changed all the possible collations on the columns too!而且我也更改了列上所有可能的排序规则!

And if I do something like:如果我做这样的事情:

WHERE p.NAME LIKE _utf8 '%jose%' COLLATE utf8_general_ci;

The same 83 rows are reported, as if I've made nothing...报告了相同的 83 行,好像我什么也没做...

You have already tried to use an accent-insensitive collation for your search and ordering.您已经尝试使用不区分重音的排序规则进行搜索和排序。

http://dev.mysql.com/doc/refman/5.0/en/charset-collation-implementations.html http://dev.mysql.com/doc/refman/5.0/en/charset-collat​​ion-implementations.html

The thing is, your NAME column seems to be stored in the latin1 (8-bit) character set.问题是,您的NAME列似乎存储在 latin1(8 位)字符集中。 That's why mySQL is grumbling at you like this:这就是为什么 mySQL 像这样向你抱怨:

  COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

You may get the results you want if you try如果你尝试,你可能会得到你想要的结果

 WHERE CONVERT(p.NAME USING utf8) LIKE _utf8 '%jose%' COLLATE utf8_general_ci;

But, be careful!不过要小心!

When you use any kind of function (in this example, CONVERT) on the column in a WHERE statement, you defeat MySQL's attempts to optimize your search with indexes.当您在 WHERE 语句中的列上使用任何类型的函数(在此示例中为 CONVERT)时,您会挫败 MySQL 使用索引优化搜索的尝试。 If this project is going to get large (that is, if you will have lots of rows in your tables) you need to store your data in utf8 format, not latin1.如果这个项目会变大(也就是说,如果你的表中有很多行),你需要以 utf8 格式存储数据,而不是 latin1。 (You probably already know that your LIKE '%whatever%' search term also defeats MySQL's indexing.) (你可能已经知道你的LIKE '%whatever%'搜索词也会破坏 MySQL 的索引。)

Just in case someone else stumbles upon this issue, I have found a way that solves the problem, at least for me.以防万一其他人偶然发现这个问题,我找到了一种解决问题的方法,至少对我来说是这样。

I am using PHP to insert and retrieve records from the database.我正在使用 PHP 从数据库中插入和检索记录。 Even though my Database, tables and columns are utf8, as well as the encoding of the PHP files, the truth is that the encoding used in the connection between PHP and MySQL is being made using latin1.尽管我的数据库、表和列是 utf8,以及 PHP 文件的编码,但事实是 PHP 和 MySQL 之间的连接中使用的编码是使用 latin1 进行的。 I managed to find this using我设法找到了这个使用

$mysqli->character_set_name();

where $mysqli is your object.其中$mysqli是您的对象。

For the searches to start working as expected, returning accent insensitive and case insentive records for characters with accents or not, I have to explicitly set the character set of the connection.为了使搜索按预期开始工作,为带重音或不带重音的字符返回不区分重音和不区分大小写的记录,我必须明确设置连接的字符集。

To do this, you just have to do the following:为此,您只需执行以下操作:

$mysqli->set_charset('utf8');

where $mysqli is your mysqli object.其中 $mysqli 是您的 mysqli 对象。 If you have a database management class that wraps your database functionality, this is easy to apply to a complete app.如果您有一个包含数据库功能的数据库管理类,这很容易应用于完整的应用程序。 If not, you have to set this explicitly everywhere you open a connection.如果没有,您必须在打开连接的任何地方明确设置它。

I hope this helps someone out, as I was already freaking out about this!我希望这可以帮助某人,因为我已经为此感到害怕了!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM