MySQL REGEXP查询 - 重音不敏感搜索

Question

I'm looking to query a database of wine names, many of which contain accents (but not in a uniform way, and so similar wines may be entered with or without accents) 我正在寻找一个葡萄酒名称的数据库，其中许多包含重音（但不是统一的方式，所以类似的葡萄酒可以输入或不加重音）

The basic query looks like this: 基本查询如下所示：

SELECT * FROM `table` WHERE `wine_name` REGEXP '[[:<:]]Faugères[[:>:]]'

which will return entries with 'Faugères' in the title, but not 'Faugeres' 这将返回标题中带有'Faugères'的条目，但不会返回'Faugeres'

SELECT * FROM `table` WHERE `wine_name` REGEXP '[[:<:]]Faugeres[[:>:]]'

does the opposite. 反其道而行之。

I had thought something like: 我原以为：

SELECT * 
FROM `table` 
WHERE `wine_name` REGEXP '[[:<:]]Faug[eèêéë]r[eèêéë]s[[:>:]]'

might do the trick, but this only returns the results without the accents. 可能会做的伎俩，但这只返回没有重音的结果。

The field is collated as utf8_unicode_ci, which from what I've read is how it should be. 该字段被整理为utf8_unicode_ci，从我读过的内容是它应该如何。

Any suggestions?! 有什么建议？！

Answer 1

You're out of luck : 你运气不好：

Warning 警告

The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multi-byte safe and may produce unexpected results with multi-byte character sets. REGEXP和RLIKE运算符以字节方式工作，因此它们不是多字节安全的，并且可能会产生多字节字符集的意外结果。 In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal. 此外，这些运算符通过字节值比较字符，并且即使给定的排序规则将重音字符视为相等，重音字符也可能无法比较。

The [[:<:]] and [[:>:]] regexp operators are markers for word boundaries. [[:<:]]和[[:>:]]表达式运算符是字边界的标记。 The closest you can achieve with the LIKE operator is something on this line: 使用LIKE运算符可以实现的最接近的就是这一行：

SELECT *
FROM `table`
WHERE wine_name = 'Faugères'
   OR wine_name LIKE 'Faugères %'
   OR wine_name LIKE '% Faugères'

As you can see it's not fully equivalent because I've restricted the concept of word boundary to spaces. 正如你所看到的那样，它并不完全等价，因为我已经将字边界的概念限制在空格中。 Adding more clauses for other boundaries would be a mess. 为其他边界添加更多子句将是一团糟。

You could also use full text searches (although it isn't the same) but you can't define full text indexes in InnoDB tables (yet). 您也可以使用全文搜索（尽管它不相同）但您无法在InnoDB表中定义全文索引（尚未）。

You're certainly out of luck :) 你当然不幸:)

Addendum: this has changed as of MySQL 8.0: 附录：从MySQL 8.0开始，这已经改变了：

MySQL implements regular expression support using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. MySQL使用国际Unicode组件（ICU）实现正则表达式支持，ICU提供完整的Unicode支持并且是多字节安全的。 (Prior to MySQL 8.0.4, MySQL used Henry Spencer's implementation of regular expressions, which operates in byte-wise fashion and is not multibyte safe. （在MySQL 8.0.4之前，MySQL使用Henry Spencer的正则表达式实现，它以字节方式运行，并且不是多字节安全的。

Answer 2

Because REGEXP and RLIKE are byte oriented, have you tried: 因为REGEXP和RLIKE是面向字节的，你试过：

SELECT 'Faugères' REGEXP 'Faug(e|è|ê|é|ë)r(e|è|ê|é|ë)s';

This says one of these has to be in the expression. 这说明其中一个必须在表达中。 Notice that I haven't used the plus(+) because that means ONE OR MORE. 请注意，我没有使用加号（+），因为这意味着一个或多个。 Since you only want one you should not use the plus. 既然你只想要一个，你不应该使用加号。

Answer 3

utf8_general_ci see no difference between accent/no accent when sorting. utf8_general_ci在排序时看到重音/没有重音之间没有区别。 Maybe this true for searches as well. 也许这对搜索也是如此。 Also, change REGEXP to LIKE. 另外，将REGEXP更改为LIKE。 REGEXP makes binary comparison. REGEXP进行二进制比较。

Answer 4

Ok I just stumbled on this question while searching for something else. 好的，我在搜索其他内容时偶然发现了这个问题。

This returns true. 这返回true。

SELECT 'Faugères' REGEXP 'Faug[eèêéë]+r[eèêéë]+s';

Hope it helps. 希望能帮助到你。

Adding the '+' Tells the regexp to look for one or more occurrences of the characters. 添加'+'告诉正则表达式查找一个或多个字符。

Answer 5

To solve this problem, I tried different things, including using the binary keyword or the latin1 character set but to no avail. 为了解决这个问题，我尝试了不同的东西，包括使用二进制关键字或latin1字符集但无济于事。
Finally, considering that it is a MySql bug, I ended up replacing the é and è chars, 最后，考虑到它是一个MySql错误，我最终取代了é和èchars，

Like this : 像这样：

SELECT * 
FROM `table` 
WHERE replace(replace(wine_name, 'é', 'e'), 'è', 'e') REGEXP '[[:<:]]Faugeres[[:>:]]'

Answer 6

I had the same problem trying to find every record matching one of the following patterns: 'copropriété', 'copropriete', 'COPROPRIÉTÉ', 'Copropri?t?' 我有同样的问题试图找到符合下列模式之一的每条记录：'copropriété'，'copropriete'，'COPROPRIÉTÉ'，'Copropri？t？'

REGEXP 'copropri.{1,2}t.{1,2} worked for me. REGEXP 'copropri.{1,2}t.{1,2}为我工作。 Basically, .{1,2} will should work in every case wether the character is 1 or 2 byte encoded. 基本上， .{1,2}将在每种情况下工作，字符是1或2字节编码。

Explanation: https://dev.mysql.com/doc/refman/5.7/en/regexp.html 说明： https ： //dev.mysql.com/doc/refman/5.7/en/regexp.html

Warning 警告
The REGEXP and RLIKE operators work in byte-wise fashion, so they are not multibyte safe and may produce unexpected results with multibyte character sets. REGEXP和RLIKE运算符以字节方式工作，因此它们不是多字节安全的，并且可能会产生多字节字符集的意外结果。 In addition, these operators compare characters by their byte values and accented characters may not compare as equal even if a given collation treats them as equal. 此外，这些运算符通过字节值比较字符，并且即使给定的排序规则将重音字符视为相等，重音字符也可能无法比较。

Answer 7

I have this problem, and went for Álvaro's suggestion above. 我有这个问题，并且去了Álvaro的建议。 But in my case, it misses those instances where the search term is the middle word in the string. 但就我而言，它错过了搜索词是字符串中间词的那些实例。 I went for the equivalent of: 我去了相当于：

SELECT *
FROM `table`
WHERE wine_name = 'Faugères'
   OR wine_name LIKE 'Faugères %'
   OR wine_name LIKE '% Faugères'
   OR wine_name LIKE '% Faugères %'

MySQL REGEXP查询 - 重音不敏感搜索

问题描述

7 个解决方案

解决方案1
5 2013-01-03 10:47:37

解决方案2
3 2014-11-14 18:26:30

解决方案3
1 2013-01-03 10:49:34

解决方案4
0 2013-08-16 03:48:03

解决方案5
0 2014-07-21 17:11:38

解决方案6
0 2017-03-29 16:41:52

解决方案7
0 2018-05-30 04:47:38

MySQL REGEXP查询 - 重音不敏感搜索

问题描述

7 个解决方案

解决方案1 5 2013-01-03 10:47:37

解决方案2 3 2014-11-14 18:26:30

解决方案3 1 2013-01-03 10:49:34

解决方案4 0 2013-08-16 03:48:03

解决方案5 0 2014-07-21 17:11:38

解决方案6 0 2017-03-29 16:41:52

解决方案7 0 2018-05-30 04:47:38

解决方案1
5 2013-01-03 10:47:37

解决方案2
3 2014-11-14 18:26:30

解决方案3
1 2013-01-03 10:49:34

解决方案4
0 2013-08-16 03:48:03

解决方案5
0 2014-07-21 17:11:38

解决方案6
0 2017-03-29 16:41:52

解决方案7
0 2018-05-30 04:47:38