mysql 重音不敏感和虚线不敏感搜索

Question

The Problem : I am trying to implement a search algorithm that shows the results even when dotted chars are provided.问题：我正在尝试实现一种搜索算法，即使提供了虚线字符也能显示结果。 In other words: SELECT 'über' = 'uber' or SELECT 'mas' = 'maş' these results will return true.换句话说： SELECT 'über' = 'uber'或SELECT 'mas' = 'maş'这些结果将返回 true。 This would apply for every single char in the following array:这将适用于以下数组中的每个字符：

$arr = array('ş' => 's', 'ç' => 'c', 'ö' => 'o', 'ü' => 'u' and so on ...);

The Solution In My Mind : Along with the original column, I can have a particular column that stores the English names.我心中的解决方案：除了原始列之外，我还可以有一个特定的列来存储英文名称。 So before storing 'über' to database, I will also convert it to 'uber' in php and then will store both 'über' (as the original) and 'uber' (as the searchable) to the database.因此，在将“über”存储到数据库之前，我还将在 php 中将其转换为“uber”，然后将“über”（作为原始）和“uber”（作为可搜索的）存储到数据库中。

But then, even though I've searched for this the whole day, I still believe that there should be a simplier and cleaner way to accomplish the task since this would mean (more or less) to store the same data twice in the database.但是，即使我已经搜索了一整天，我仍然相信应该有一种更简单、更清晰的方法来完成任务，因为这意味着（或多或少）将相同的数据存储在数据库中两次。 So guys, what do you think is the solution the only way to go or you know a better approach?那么伙计们，您认为解决方案是唯一的出路还是您知道更好的方法？

EDIT编辑

For accent insensitive I've seen the posts on SO, they are working but since I am also considering the dotted chars, I had to ask this question.对于口音不敏感，我已经看到了 SO 上的帖子，它们正在工作，但由于我也在考虑虚线字符，我不得不问这个问题。

EDIT2编辑2

I cannot post the whole table structure and code exactly for some reasons but I'll provide a close example.由于某些原因，我无法完全发布整个表结构和代码，但我将提供一个接近的示例。

myusers | CREATE TABLE `myusers` (
id int auto_increment not null primary key,
email varchar(100) COLLATE latin1_general_ci not null,
fullname varchar(75) COLLATE latin1_general_ci not null)
PRIMARY KEY('id')
) ENGINE=MyISAM AUTO_INCREMENET=2 DEFAULT CHARSET=latin1 COLLATE latin1_general_ci |

The above is the structure of the table.以上是表的结构。 Here comes the inserts and selects:这里是插入和选择：

INSERT INTO myusers (fullname) VALUES ('Agüeda');
INSERT INTO myusers (fullname) VALUES ('Agueda');

SELECT * FROM myusers WHERE fullname = 'Agüeda' COLLATE latin1_general_ci 

+----+-------+----------+
| id | email | fullname |
+----+-------+----------+
|  1 |       | Agüeda   |
+----+-------+----------+
1 row in set (0.00 sec)

SELECT * FROM myusers WHERE fullname = 'agueda' COLLATE latin1_general_ci 

+----+-------+----------+
| id | email | fullname |
+----+-------+----------+
|  2 |       | Agueda   |
+----+-------+----------+
1 row in set (0.00 sec)

Well, the desired result is obviously when agueda is searched both 'Agueda' and 'Agüeda' will return, but that's not the case.好吧，显然期望的结果是在搜索 agueda 时 'Agueda' 和 'Agüeda' 都会返回，但事实并非如此。 As I mentioned above, I have created a new column and store the whole name in English characters and make the search from there as well.正如我上面提到的，我创建了一个新列并以英文字符存储全名，并从那里进行搜索。 But still, it costs me a two times search (because I am also searching from the original columns which rank higher in the search result).但是，它仍然花费了我两次搜索（因为我也在搜索在搜索结果中排名更高的原始列）。 There should be a better way...应该有更好的方法...

Answer 1

Just use an appropriate collation.只需使用适当的排序规则。 For instance:例如：

create table test(
    foo text
) collate = utf8_unicode_ci;
insert into test values('Agüeda');
insert into test values('Agueda');
select * from test where foo = 'Agueda';

This gives your two rows.这给了你的两行。

Answer 2

1) Write your own collation. 1) 编写您自己的校对规则。 latin1_general_diacriticinsensitive. latin1_general_diacriticinsensitive。 I wouldn't even know where to begin, though :).不过，我什至不知道从哪里开始:)。

2) Use regex and character groups: /[uü]ber/ 2) 使用正则表达式和字符组：/[uü]ber/

3) The Solution In Your Mind. 3）你心中的解决方案。 I'd personally use this, since design is all about compromise and this is a simple solution with just a 100% space overhead.我个人会使用它，因为设计就是妥协，这是一个简单的解决方案，只有 100% 的空间开销。 Granted, the space overhead might eventually turn into a speed overhead, especially with MySQL, but that's to worry about later.诚然，空间开销最终可能会变成速度开销，尤其是对于 MySQL，但这是以后要担心的。 This is also very easy to undo if need be.如果需要，这也很容易撤消。

Answer 3

Well, instead of trying to replace them and run the search the x-times, I'd suggest using the mysql function LIKE ie好吧，与其尝试替换它们并运行 x 次搜索，我建议使用 mysql 函数LIKE ie

SELECT * FROM x WHERE search LIKE '%ber'

Where you have to replace the diacritics with "% .你必须用"%替换变音符号的地方。

EDIT: My mistake % replaces any number of characters.编辑：我的错误%替换了任意数量的字符。 Use _ for a single char.使用_表示单个字符。

Answer 4

Take a look at this post: https://stackoverflow.com/questions/500826看看这篇文章： https : //stackoverflow.com/questions/500826

He has just the opposite issue you're facing.他的问题与你面临的正好相反。 Look at the WHERE clause in the selected answer.查看所选答案中的 WHERE 子句。 Probably you could just use the _ci suffix and it'll work.可能你可以只使用_ci后缀，它会起作用。

Let us know how this is resolved.让我们知道这是如何解决的。

mysql 重音不敏感和虚线不敏感搜索

问题描述

4 个解决方案

解决方案1
2 2011-10-10 10:28:44

解决方案2
2 已采纳 2011-10-14 00:44:21

解决方案3
0 2011-10-09 22:58:13

解决方案4
0 2011-10-09 23:26:54

mysql 重音不敏感和虚线不敏感搜索

问题描述

4 个解决方案

解决方案1 2 2011-10-10 10:28:44

解决方案2 2 已采纳 2011-10-14 00:44:21

解决方案3 0 2011-10-09 22:58:13

解决方案4 0 2011-10-09 23:26:54

解决方案1
2 2011-10-10 10:28:44

解决方案2
2 已采纳 2011-10-14 00:44:21

解决方案3
0 2011-10-09 22:58:13

解决方案4
0 2011-10-09 23:26:54