简体   繁体   English

在 SQL 中查找最接近的匹配项

[英]Find closest match in SQL

Given a table called "Example" as follows;给定一个名为“示例”的表,如下所示;

ID|A|B|C
--------
01|X|Y|A
02|Z|Z|A
03|Q|P|A
  • If searching for X,Y,A = returns row 1 (Exact match)如果搜索 X,Y,A = 返回第 1 行(完全匹配)
  • If searching for Q,X,A = returns row 3 (Closest match)如果搜索 Q,X,A = 返回第 3 行(最接近的匹配)

I can do this as multiple seperate SQL statements我可以这样做作为多个单独的 SQL 语句

select ID from EXAMPLE where A=@A and B=@B and C=@C

... if this returns zero rows, then: ...如果这返回零行,则:

Select ID from EXAMPLE where A=@A and B=@B

... if this returns zero rows, then: ...如果这返回零行,则:

Select ID from EXAMPLE where A=@A and C=@C

... if this returns zero rows, then: ...如果这返回零行,则:

Select ID from EXAMPLE where B=@B and C=@C

... etc. ... ETC。

But I would imagine this is going to be very bad for performance.但我想这将对性能非常不利。 Is there a better way?有没有更好的办法?

Use a CASE expression in ORDER BY to get the closest matches:ORDER BY中使用CASE表达式来获得最接近的匹配:

select top 1 with ties * 
from EXAMPLE 
order by
  case when A=@A then 1 else 0 end +
  case when B=@B then 1 else 0 end +
  case when C=@C then 1 else 0 end desc

This will work in SQL Server.这将适用于 SQL 服务器。
See the demo .演示

Or with RANK() window function:或使用RANK() window function:

with cte as (
  select *,
    case when A=@A then 1 else 0 end +
    case when B=@B then 1 else 0 end +
    case when C=@C then 1 else 0 end matches  
  from Example
)
select t.ID, t.A, t.b, t.C 
from (
  select *, rank() over (order by matches desc) rnk
  from cte 
) t  
where t.rnk = 1 and t.matches > 0

See the demo .演示

Your data model is not ideal for this problem.您的数据 model 不适合此问题。 I would suggest unpivoting, and then using aggregation:我建议取消透视,然后使用聚合:

WITH cte1 AS (
    SELECT ID, A AS val FROM EXAMPLE UNION ALL
    SELECT ID, B FROM EXAMPLE UNION ALL
    SELECT ID, C FROM EXAMPLE
),
cte2 AS (
    SELECT ID, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) rn
    FROM cte1
    WHERE val IN ('Q', 'X', 'A')   -- replace with the values to search
    GROUP BY ID
)

SELECT *
FROM EXAMPLE
WHERE ID = (SELECT ID FROM cte2 WHERE rn = 1);

From a performance perspective, you probably want:从性能的角度来看,您可能想要:

select t.*
from t
where t.a = @a or t.b = @b or t.c = @c
order by (case when t.a = @a then 1 else 0 end +
          case when t.b = @b then 1 else 0 end +
          case when t.c = @c then 1 else 0 end
         ) desc
fetch first 1 row only;

The where clause is important if your table has any size to it.如果您的表有任何大小,那么where子句很重要。 It guarantees that at least one column matches -- and that should reduces the amount of data needed for sorting.它保证至少有一列匹配——这应该会减少排序所需的数据量。

If you have multiple indexes on the table (see further down), then an exhaustive approach using union all might provide to have better performance.如果表上有多个索引(请参阅下文),那么使用union all的详尽方法可能会提供更好的性能。 Looking for full matches and matches on 2 out of 3, this looks like:查找完整匹配项和 3 项中有 2 项的匹配项,如下所示:

with match_full as (
      select t.*
      from t
      where a = @a and b = @b and c = @c
      fetch first 1 row only
     ),
     match_ab as (
      select *
      from t
      where t.a = @a and t.b = @b and
            not exists (select 1 from match_full)
      fetch first 1 row only
     ),
     match_ac as (
      select *
      from t
      where t.a = @a and t.c = @c and
            not exists (select 1 from match_full) and
            not exists (select 1 from match_ab) 
      fetch first 1 row only
     ),
     match_bc as (
      select *
      from t
      where t.b  = @b and t.c = @c and
            not exists (select 1 from match_full) and
            not exists (select 1 from match_ab) 
      fetch first 1 row only
     )
select *
from match_full
union all
select *
from match_ab
union all
select *
from match_ac
union all
select *
from match bc;

In particular, this can take advantage of three indexes: (a, b, c) , (a, c) , and (b, c) .特别是,这可以利用三个索引: (a, b, c)(a, c)(b, c) Each CTE should be a simple index lookup and it is hard to see how the query could be faster.每个 CTE 都应该是一个简单的索引查找,很难看出查询如何更快。

It can, of course, be extended to handle singleton matches as well -- using the same indexes.当然,它也可以扩展到处理 singleton 匹配——使用相同的索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM