简体   繁体   English

SQL Server全文排名示例

[英]SQL Server Full-Text Rankings Example

So far, I'm not getting meaningful results from my full-text queries so I decided to give a simple example of what I am trying to do and the results I expect. 到目前为止,我没有从全文查询中得到有意义的结果,因此我决定举一个简单的例子说明我要执行的操作和期望的结果。

I've made the the following test table (tblCars) with full-text enabled for the column [Car] and primary key [CarID]. 我制作了以下测试表(tblCars),其中为[Car]列和主键[CarID]启用了全文本。

CarID Car
----- -----------------
9     BMW 330Ci 2009
14    AUDI A4 2010
16    AUDI A3 2.0T 2009

I want to run a ranked search for the terms 'audi OR bmw', I expect to get equal rankings on all search results. 我想对“ audi or bmw”一词进行排名搜索,我希望所有搜索结果都能获得相同的排名。

SELECT tblCars.*, [RANK] AS Ranked FROM viewCarSearch 
    INNER JOIN 
    (SELECT [KEY] AS CarID, [RANK] AS Ranked FROM CONTAINSTABLE 
    (tblCars, Car, @SearchOr))
    tblSearch ON tblCars.CarID = tblSearch.CarID

Instead I get this: 相反,我得到这个:

CarID Car                Ranked
----- ------------------ -------
9     BMW 330Ci 2009     48
14    AUDI A4 2010       32
16    AUDI A3 2.0T 2009  32

In fact, no matter what combination of ORs I do, the BMW is always ranked higher or equal to the AUDI even if it seems totally illogical. 实际上,无论我采用哪种OR组合,即使看上去完全不合逻辑,BMW的排名也始终高于或等于AUDI。 I've tried using some ANDs in my search term and still it gives strange results, with the BMW always showing more favorably than expected. 我尝试在搜索词中使用一些AND,但仍然给出奇怪的结果,而BMW总是比预期的要好。

Can anyone point to where I'm going wrong... I'm thinking my expectations must be all wrong, but I can't imagine how I'm gonna get well ranked results for a large table. 谁能指出我要去哪里错了...我想我的期望肯定是错误的,但是我无法想象我如何能在一张大桌子上获得良好的排名。

Obviously, Microsoft believes that BMW is a superior car to the Audi. 显然,微软认为宝马是奥迪的佼佼者。 :-) :-)

OK, seriously, there are many factors that go into calculating the RANK returned, which is a unitless number between 1 and 1000. Fulltext primarily uses the Jaccard Index for calculating ranks. 好的,很严重,计算返回的RANK有很多因素,这是1到1000之间的无单位数。全文主要使用Jaccard Index来计算排名。 Other factors taken into consideration include document length (other factors being equal, shorter documents will rank higher than longer documents) and the number of occurrences of the search word/phrase in the document. 考虑的其他因素包括文档长度(其他因素相同,较短的文档将比较长的文档排名更高)以及文档中搜索词/短语的出现次数。

My best guess at explaining your results, and I stress that it's only an educated guess, is that: 我最好的解释您的结果的猜测,我强调这只是有根据的猜测,是:

  • CarIDs 14 and 16 have very similar text, the first 10 characters vary only in two characters (4 vs. 3 at position 7, 0 vs. . at position 10), so they will be ranked close together. CarID 14和16的文字非常相似,前10个字符仅在两个字符之间有所不同(位置7为4比3,位置10为0与。),因此它们的排名会很接近。 In fact, they come out equal in your example. 实际上,在您的示例中它们相等。
  • CarID 9's text is shorter than CarID 16's, so it will merit a higher ranking. CarID 9的文字比CarID 16的文字短,因此排名较高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM