简体   繁体   中英

MySQL - accent insensitive search with polish characters - UTF8

I found alot of answers about collation and accent insensitive search, read like 1000 of posts and articles about this problem, but did not find the answer.

Does anybody know how to force mysql to search accent insensitive with ALL polish characters? Maybe anybody got a compiled collation file for that (Debian)?

Please note, that:

  • Setting collation to utf8_general_ci does not help. It does not support Ł properly. But it does ruin search order.
  • Setting collation to utf8_unicode_ci does not help. Same as above.
  • Editing collation files is not possible because it is multibyte encoding. And multibyte charsets have to be compiled.
  • Replacing all unsupported letters to supported ones is not a solution.

I really don't understand why MySQL crew doesn't threat this as a bug. It's obvious, that it is, and it has been for ages. Since 4.xx they did correct Ś letter... so why not Ł ?!

I found some references to This MySQL functionality , but no information on how to use it. I don't really understand what is being written there and if it can help me.

Tests:

mysql> show full columns from test;
+-------+--------------+----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type         | Collation      | Null | Key | Default | Extra | Privileges                      | Comment |
+-------+--------------+----------------+------+-----+---------+-------+---------------------------------+---------+
| str   | varchar(255) | utf8_polish_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+-------+--------------+----------------+------+-----+---------+-------+---------------------------------+---------+

mysql> insert into test values('Łomża');

...

mysql> select str from test where str like '%Łomża%'\G
*************************** 1. row ***************************
str: Łomża

mysql> select str from test where str like '%Łomza%'\G
Empty set (0.00 sec)

--

mysql> select str from test where str like '%Łomza%' collate utf8_general_ci\G
*************************** 1. row ***************************
str: Łomża

mysql> select str from test where str like '%Lomza%' collate utf8_general_ci\G
Empty set (0.00 sec)

--

mysql> select str from test where str like '%Łomza%' collate utf8_unicode_ci\G
*************************** 1. row ***************************
str: Łomża

mysql> select str from test where str like '%Lomza%' collate utf8_unicode_ci\G
Empty set (0.00 sec)

I just started to using MySql and the answer for 'Polish' problem is to collate utf8_unicode_520_ci (or utf8mb4_unicode_520_ci ) where l=ł=L=Ł - same for other, not only Polish, accents.

No char conversion, no ascii column, no nothing… After years of searching for ł/Ł sollution in Sqlite.

I recommend making another column for searching in your database, such as 'str_search'. When inserting strings to 'str_search' in your database, make and use a function in PHP like so:

 function convertPolishChars($phrase)
 {
    $phrase = str_replace("ą", "a", $phrase);
    $phrase = str_replace("Ą", "A", $phrase);

    $phrase = str_replace("ć", "c", $phrase);
    $phrase = str_replace("Ć", "C", $phrase);

    $phrase = str_replace("ę", "e", $phrase);
    $phrase = str_replace("Ę", "E", $phrase);

    $phrase = str_replace("ł", "l", $phrase);
    $phrase = str_replace("Ł", "L", $phrase);

    $phrase = str_replace("ń", "n", $phrase);
    $phrase = str_replace("Ń", "N", $phrase);

    $phrase = str_replace("ó", "o", $phrase);
    $phrase = str_replace("Ó", "O", $phrase);

    $phrase = str_replace("ś", "s", $phrase);
    $phrase = str_replace("Ś", "S", $phrase);

    $phrase = str_replace("ź", "z", $phrase);
    $phrase = str_replace("Ź", "Z", $phrase);

    $phrase = str_replace("ż", "z", $phrase);
    $phrase = str_replace("Ż", "Z", $phrase);

    return $phrase;
 }

"INSERT INTO test (str, str_search) VALUES ('Łomża', '" . convertPolishChars('Łomża') . "')"

and when writing a SQL query, write something like:

"SELECT str FROM test WHERE str_search like '%" . convertPolishChars('Łomża') . "%'"

This approach will execute your query string much faster, rather than doing any conversions in the SQL statement.

Make sure to Index your 'str_search' column.

For larger databases, I recommend FULLTEXT searches with MATCH AGAINST . http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM