简体   繁体   English

SQL函数确定最准确的结果

[英]SQL function to determine the most accurate result

If I have a table like this... 如果我有这样的桌子...

create table #words (
id int identity,
word varchar(1024)
)

insert into #words (word) values ('dock')
insert into #words (word) values ('dockable')

and i do a LIKE query 我做一个喜欢的查询

select id, word from #words where word like '%dock%'

Is there a way to tell which result would be the most accurate? 有没有办法判断哪个结果最准确?

For complex multi-word criteria you should use Full Text Search and CONTAINSTABLE . 对于复杂的多词标准,您应该使用全文搜索和CONTAINSTABLE The output of this table function contains a RANK column: 该表函数的输出包含一个RANK列:

The table produced by CONTAINSTABLE includes a column named RANK . CONTAINSTABLE生成的表包括名为RANK的列。 The RANK column is a value (from 0 through 1000) for each row indicating how well a row matched the selection criteria. RANK列是每行的值(从0到1000),指示行与选择标准的匹配程度。 This rank value is typically used in one of these ways in the SELECT statement: 通常在SELECT语句中以下列方式之一使用此等级值:

  • In the ORDER BY clause to return the highest-ranking rows as the first rows in the table. ORDER BY子句中,将排名最高的行作为表中的第一行。
  • In the select list to see the rank value assigned to each row. 在选择列表中查看分配给每一行的等级值。

For simple single word criteria you should implement a Levenshtein distance function in SQL CLR and use that to find the most similar best match words (or use the one from Ken Redler's linked project). 对于简单的单个单词标准,您应该在SQL CLR中实现一个Levenshtein距离函数,并使用该函数查找最相似的最佳匹配单词(或使用Ken Redler的链接项目中的那个)。

You could try using similarity metrics to get a distance score for each result as compared to the search string. 您可以尝试使用相似性指标来获得与搜索字符串相比每个结果的距离得分。 SOUNDEX and the like give you some primitive options, but there are much more sophisticated alternatives, depending on your requirement. SOUNDEX等为您提供了一些原始选择,但是根据您的要求,还有更多更复杂的选择。 The SimMetrics library of functions allows you to compare strings by Hamming distance, Levenshtein distance, etc. Here's a thorough article describing the installation and usage of the library. SimMetrics函数库允许您按汉明距离,Levenshtein距离等比较字符串。这是一篇详尽的文章,介绍了该库的安装和用法。

You can use the SOUNDEX and DIFFERENCE T-SQL functions to compare words, but you may still need a way to determine which is "most accurate". 您可以使用SOUNDEX和DIFFERENCE T-SQL函数比较单词,但是您可能仍需要一种方法来确定哪个“最准确”。

For example, run the following queries: 例如,运行以下查询:

SELECT DIFFERENCE('dock','dock');
SELECT DIFFERENCE('dock','dockable');

Difference of 'dock' and 'dock' is 4, which is the best result; 'dock'和'dock'之差为4,这是最好的结果; 'dock' and 'docakble' is a 2, which is a higher difference. 'dock'和'docakble'是2,这是一个更高的差异。

I would look at using Full Text Searching (FTS) - CONTAINS is more precise than FREETEXT 我会考虑使用全文搜索(FTS) -CONTAINSFREETEXT更精确

CONTAINS 包含

WHERE CONTAINS(word, 'dock') 

FREETEXT 自由文本

WHERE FREETEXT (word, 'dock') 

Indexed, these will be faster than LIKE and FTS includes a score value based on an algorithm to rank matching. 建立索引后,这些索引的速度将比LIKE快,而FTS包括基于排名匹配算法的分数值。 You'll have to test & see if the results fit your needs. 您必须测试并查看结果是否符合您的需求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM