简体   繁体   中英

Find closest match in SQL

Given a table called "Example" as follows;

ID|A|B|C
--------
01|X|Y|A
02|Z|Z|A
03|Q|P|A
  • If searching for X,Y,A = returns row 1 (Exact match)
  • If searching for Q,X,A = returns row 3 (Closest match)

I can do this as multiple seperate SQL statements

select ID from EXAMPLE where A=@A and B=@B and C=@C

... if this returns zero rows, then:

Select ID from EXAMPLE where A=@A and B=@B

... if this returns zero rows, then:

Select ID from EXAMPLE where A=@A and C=@C

... if this returns zero rows, then:

Select ID from EXAMPLE where B=@B and C=@C

... etc.

But I would imagine this is going to be very bad for performance. Is there a better way?

Use a CASE expression in ORDER BY to get the closest matches:

select top 1 with ties * 
from EXAMPLE 
order by
  case when A=@A then 1 else 0 end +
  case when B=@B then 1 else 0 end +
  case when C=@C then 1 else 0 end desc

This will work in SQL Server.
See the demo .

Or with RANK() window function:

with cte as (
  select *,
    case when A=@A then 1 else 0 end +
    case when B=@B then 1 else 0 end +
    case when C=@C then 1 else 0 end matches  
  from Example
)
select t.ID, t.A, t.b, t.C 
from (
  select *, rank() over (order by matches desc) rnk
  from cte 
) t  
where t.rnk = 1 and t.matches > 0

See the demo .

Your data model is not ideal for this problem. I would suggest unpivoting, and then using aggregation:

WITH cte1 AS (
    SELECT ID, A AS val FROM EXAMPLE UNION ALL
    SELECT ID, B FROM EXAMPLE UNION ALL
    SELECT ID, C FROM EXAMPLE
),
cte2 AS (
    SELECT ID, ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) rn
    FROM cte1
    WHERE val IN ('Q', 'X', 'A')   -- replace with the values to search
    GROUP BY ID
)

SELECT *
FROM EXAMPLE
WHERE ID = (SELECT ID FROM cte2 WHERE rn = 1);

From a performance perspective, you probably want:

select t.*
from t
where t.a = @a or t.b = @b or t.c = @c
order by (case when t.a = @a then 1 else 0 end +
          case when t.b = @b then 1 else 0 end +
          case when t.c = @c then 1 else 0 end
         ) desc
fetch first 1 row only;

The where clause is important if your table has any size to it. It guarantees that at least one column matches -- and that should reduces the amount of data needed for sorting.

If you have multiple indexes on the table (see further down), then an exhaustive approach using union all might provide to have better performance. Looking for full matches and matches on 2 out of 3, this looks like:

with match_full as (
      select t.*
      from t
      where a = @a and b = @b and c = @c
      fetch first 1 row only
     ),
     match_ab as (
      select *
      from t
      where t.a = @a and t.b = @b and
            not exists (select 1 from match_full)
      fetch first 1 row only
     ),
     match_ac as (
      select *
      from t
      where t.a = @a and t.c = @c and
            not exists (select 1 from match_full) and
            not exists (select 1 from match_ab) 
      fetch first 1 row only
     ),
     match_bc as (
      select *
      from t
      where t.b  = @b and t.c = @c and
            not exists (select 1 from match_full) and
            not exists (select 1 from match_ab) 
      fetch first 1 row only
     )
select *
from match_full
union all
select *
from match_ab
union all
select *
from match_ac
union all
select *
from match bc;

In particular, this can take advantage of three indexes: (a, b, c) , (a, c) , and (b, c) . Each CTE should be a simple index lookup and it is hard to see how the query could be faster.

It can, of course, be extended to handle singleton matches as well -- using the same indexes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM