简体   繁体   English

对有问题的数据库进行口音不敏感搜索

[英]Accent insensitive search on a problematic database

I have a database that contains data in different languages. 我有一个数据库,其中包含不同语言的数据。 Some languages use accents (like áéíóú) and I need to search in this data as the accents doesn't exist (search for 'campeon' should return 'campeón' as a valir result). 某些语言使用重音符号(例如áéíóú),由于不存在重音符号,我需要在此数据中进行搜索(搜索“ campeon”应返回“campeón”作为有效价结果)。

The problem is that the tables in my database (utf8_unicode_ci) are not storing utf8 characteres. 问题是我的数据库(utf8_unicode_ci)中的表未存储utf8字符。 If you see the data through phpmyadmin the words with accents looks like this: campeón 如果您通过phpmyadmin查看数据,则带有重音符号的单词如下所示: campeón

After some researching, I've found (in a StackOverflow question ) that the problem is related to the inexistence of a SET NAMES [charset] . 经过研究,我发现(在StackOverflow问题中 )该问题与SET NAMES [charset]的不存在有关。 In fact, I've made some testings and if I set names to utf8, everything works as expected. 实际上,我已经进行了一些测试,如果将名称设置为utf8,一切都会按预期进行。

Well, I have the solution, what's the problem? 好吧,我有解决方案,这是什么问题? The problem is that the database is in production, so there are thousands of strings in the database. 问题在于数据库正在生产中,因此数据库中有成千上万的字符串。 If I change the character set the client will use, all already existing string will become invalid. 如果我更改了客户端将使用的字符集,则所有已经存在的字符串将变得无效。 The question is: is there any way to: 问题是:有什么办法可以:

  1. perform accent-insensitive searches in a database that uses a wrong charset like mine? 在使用类似我的错误字符集的数据库中执行不区分重音符号的搜索?
  2. transform safely the data in the tables to the appropriate charset? 将表中的数据安全地转换为适当的字符集?
  3. continue working with mixed charsets (latin1 and utf8) in the database, assuming that latin1 data will not be accent-insensitive? 假设latin1数据不区分重音,是否继续使用数据库中的混合字符集(latin1和utf8)?

If anybody has experience in any of the solutions I propose or has a new one, I'll be very thankful if share. 如果有人对我提出的任何解决方案有经验,或者有新的解决方案,那么如果能与我分享,我将非常感激。

The problem being that the data was inserted using the wrong connection encoding, you can fix it by 问题是数据是使用错误的连接编码插入的,您可以通过以下方式修复它:

  1. Exporting the data using the wrong connection encoding, just like you have used it thus far, followed by 就像您到目前为止使用的那样,使用错误的连接编码导出数据,随后
  2. Importing the data using the correct utf8 connection encoding. 使用正确的utf8连接编码导入数据。

That will fix the encoding problem, after which search will work as expected. 这样可以解决编码问题,然后搜索将按预期进行。

What if you create a copy of the table at the beginning of your session, alter the copy's charset, perform all your queries from that, and then drop the table at the end of your session? 如果您在会话开始时创建表的副本,更改副本的字符集,从中执行所有查询,然后在会话结束时删除表,该怎么办? I don't know how practical this would be - depends on how often you need to perform these queries and how big the table is. 我不知道这有多实用-取决于您执行这些查询的频率以及表格的大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM