简体   繁体   中英

How to find the row with the maximum matching columns?

If there are three tables, TableItem, TableAbcd and TablePqrs, as below

TableItem
ID  item
1   item1

TableAbcd
ID  Item    ColA    ColB    ColC    ColD
1   item1   A1      B1      C1      D1


TablePqrs
ID  item    ColA    ColB    ColC    ColD    ColValue
1   item1   A1      B1      null    null    10000
2   item1   A1      B1      C1      D1      100

Here, for a given Item, There has to be just one record in the output which has the maximum columns matching in TableAbcd and TablePqrs. Since row 1 of TableAbcd has maximum matching columns with TablePqrs row 2.

My output for join with above three tables should be,

item    ColA    ColB    ColC    ColD    ColValue
item1   A1      B1      C1      D1      100

Code tried so far,

Select  item,   ColA,   ColB,   ColC,   ColD,   ColValue
     FROM TableItem a 
         LEFT OUTER JOIN TableAbcd b
         ON a.item = b.item      
         LEFT OUTER JOIN TablePqrs c
         ON (b.ColA = c.ColA AND b.ColB = c.ColB AND b.ColC = c.ColC AND b.ColD = c.ColD)
         OR (b.ColA = c.ColA AND b.ColB = c.ColB AND b.ColC = c.ColC)
         OR (b.ColA = c.ColA AND b.ColB = c.ColB)

if fetch's me two records, i know there may be design issues, but we are getting data from third party legacy system, which has table structure as per its needs and sending this to another interface.

Please suggest.

Here the question is: How many columns match between B and C?

For the join clause you only need that at least one column of b matches the same column in c:

from c
left join b
     on c.A = b.A or c.B = b.B or c.C = b.C or c.D = b.D

You can calc it by:

(case when c.A = b.A then 1 else 0 end) 
+ (case when c.B = b.B then 1 else 0 end)
+ (case when c.C = b.B then 1 else 0 end)
+ (case when c.D = b.D then 1 else 0 end) as matches

Then simply order by matching rows (descendant) and limit the result to 1 row.

select 
   c.id, c.item, c.A, c.B, c.C, c.D, c.colValue,
   (case when c.A = b.A then 1 else 0 end) 
   + (case when c.B = b.B then 1 else 0 end)
   + (case when c.C = b.B then 1 else 0 end)
   + (case when c.D = b.D then 1 else 0 end) as matches
from c
left join b
     on c.A = b.A or c.B = b.B or c.C = b.C or c.D = b.D
order by
   ((case when c.A = b.A then 1 else 0 end) 
   + (case when c.B = b.B then 1 else 0 end)
   + (case when c.C = b.B then 1 else 0 end)
   + (case when c.D = b.D then 1 else 0 end)) desc
limit 1;

I've set up a rextester example just to check it: http://rextester.com/IPA67860

With TableAbcd called a and TablePqrs called p , the number of matches is (p.cola = a.cola) + (p.colb = a.colb) + (p.colc = a.colc) + (p.cold = a.cold) , because in MySQL true is 1 and false is 0.

Now you are looking for the p records for which no other p record exists with a higher number of matches:

select *
from tablepqrs p1
where not exists
(
  select *
  from tablepqrs p2
  join tableabcd a on a.item = p2.item
  where p2.item = p1.item
  and (p2.cola = a.cola) + (p2.colb = a.colb) + (p2.colc = a.colc) + (p2.cold = a.cold) >
      (p1.cola = a.cola) + (p1.colb = a.colb) + (p1.colc = a.colc) + (p1.cold = a.cold)
);

In the code below you can see another option to filter. Its a similar approach to the one proposed by McNets, but using window functions.

The key is to compute a ranking which allows to determine the TablePqrs row with the best match. In the other hand, if two rows have the same ranking for the same item value, we have to use additional criteria to undo the tie. in the example, the criteria is the ID of the TableAbcd table. I'm not using outer joins so there will be no results for TableItems records without match ranking.

I'm not pretty sure if it really fits what you really want, just try it and get your own conclusions.

SELECT TableItem.id, 
       TableItem.item, 
       TablePqrs.colA, 
       TablePqrs.colB, 
       TablePqrs.colC, 
       TablePqrs.colD, 
       TablePqrs.value
  FROM TableItem
  INNER JOIN (SELECT DISTINCT 
                     tableItemId, 
                     FIRST_VALUE(tablePqrsId) OVER (PARTITION BY tableItemId ORDER BY ranking DESC, tablePqrsId DESC) tablePqrsId  
          FROM (SELECT rankTableItem.ID tableItemId, 
                       rankTablePqrs.ID tablePqrsId, 
                       CASE WHEN rankTablePqrs.colA IS NULL THEN 0 ELSE 1 END + 
                       CASE WHEN rankTablePqrs.colB IS NULL THEN 0 ELSE 1 END +
                       CASE WHEN rankTablePqrs.colC IS NULL THEN 0 ELSE 1 END +
                       CASE WHEN rankTablePqrs.colD IS NULL THEN 0 ELSE 1 END ranking
                  FROM TableItem rankTableItem 
                  INNER JOIN TableAbcd rankTableAbcd ON rankTableItem.item = rankTableAbcd.item      
                  INNER JOIN TablePqrs rankTablePqrs ON rankTablePqrs.item = rankTableAbcd.item 
                                                        AND (rankTableAbcd.colA = rankTablePqrs.colA 
                                                              OR rankTableAbcd.colB = rankTablePqrs.colB 
                                                              OR rankTableAbcd.colC = rankTablePqrs.colC 
                                                              OR rankTableAbcd.colD = rankTablePqrs.colD))) pivotTable ON pivotTable.tableItemId = TableItem.Id
  INNER JOIN TablePqrs ON TablePqrs.Id = pivotTable.tablePqrsId 

I tried the below thing and it worked, the coalesce helps me prioritise which value to pick depending upon the order i mention in it.

Select  item,   ColA,   ColB,   ColC,   ColD,   ColValue
     FROM TableItem a 
    LEFT OUTER JOIN (
         SELECT item,
            COALESCE(c1.ColValue,c2.ColValue,c3.ColValue) ColValue
        FROM abc b
        LEFT OUTER JOIN pqr c1 
            ON b.ColA = c1.ColA AND b.ColB = c1.ColB AND b.ColC = c1.ColC AND b.ColD = c1.ColD
        LEFT OUTER JOIN pqr c2 
            ON b.ColA = c2.ColA AND b.ColB = c2.ColB AND b.ColC = c2.ColC
        LEFT OUTER JOIN pqr c3 
            ON b.ColA = c3.ColA AND b.ColB = c3.ColB
        GROUP BY item
     ) as Fact 
     ON Fact.item = a.item

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM