[英]Fuzzy match in SQL Server when using Chinese_Hong_Kong_Stroke_90_CI_AS collation
Suppose we create a table as follows:假设我们创建一个表如下:
create table my_table (
id int,
city nvarchar(256) collate Chinese_Hong_Kong_Stroke_90_CI_AS)
INSERT INTO my_table (id, city)
VALUES (1, 'Shanghai');
INSERT INTO my_table (id, city)
VALUES (2, 'Shandong');
INSERT INTO my_table (id, city)
VALUES (3, 'Shanxi');
INSERT INTO my_table (id, city)
VALUES (4, 'Shaanxi');
There are now four records in my_table: my_table 现在有四条记录:
id city
1 Shanghai
2 Shandong
3 Shanxi
4 Shaanxi
The following SQL queries return the same number.以下 SQL 查询返回相同的数字。 How to avoid this error?
如何避免这个错误?
select top 1 id from my_table order by DIFFERENCE(city, 'Shanghai') desc
select top 1 id from my_table order by DIFFERENCE(city, 'Shandong') desc
Another problem:另一个问题:
select top 1 id from my_table order by DIFFERENCE(city, 'Shannxi') desc
Returns 3 when the correct answer should be 4.当正确答案应该是 4 时返回 3。
The issue is caused by the collation of your column.该问题是由您的列的整理引起的。 As per the docs
SOUNDEX
& DIFFERENCE
are collation dependent.根据文档
SOUNDEX
& DIFFERENCE
取决于排序规则。
A possible solution is:一个可能的解决方案是:
select top 1 id
from my_table
order by DIFFERENCE(city collate SQL_Latin1_General_CP1_CI_AS, 'Shanghai') desc
select top 1 id
from my_table
order by DIFFERENCE(city collate SQL_Latin1_General_CP1_CI_AS, 'Shandong') desc
I think I would add another column with a SQL_Latin1_General_CP1_CI_AS
collation which stores exactly the same value as city
.我想我会添加另一个带有
SQL_Latin1_General_CP1_CI_AS
排序SQL_Latin1_General_CP1_CI_AS
列,它存储的值与city
完全相同。
Although it would be of interest to know exactly what you are trying to accomplish?尽管确切地知道您要完成的工作会很有趣? Because in your current collation those 2 words apparently sound exactly the same.
因为在您当前的校对中,这两个词显然听起来完全相同。
Its also worth reading Beyond SOUNDEX & DIFFERENCE它也值得一读Beyond SOUNDEX & DIFFERENCE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.