MySQL SELECT 具有最大相似度的行

Question

Im have table like:我有这样的表：

id | val1 | val2 | val3 | val4 | val5

How to find row(s) with maximum count equal vals (not all vals have to be equal).如何找到最大计数等于 vals 的行（并非所有 vals 都必须相等）。
Example: says I have 4 rows in table:示例：说我在表中有 4 行：

1 | Mercedes | E | black | 250hp | 270kmh | 
2 | Mercedes | C | white | 250hp | 240kmh | 
3 | BMW      | C | white | 250hp | 250kmh | 
4 | PORCHE   | E | red   | 300hp | 290kmh |

I select:我选择：

val1=PORCHE val2=E val3=red val4=250 val5=270

and get:并得到：

1 | Mercedes | E | black | 250hp | 270kmh | 
4 | PORCHE   | E | red   | 300hp | 290kmh |

because both have 3 equal fields.因为两者都有 3 个相等的字段。
Also question not about cars and I hope make this with one table.也不要问汽车，我希望用一张桌子来做这个。
That table about checking hardware of user and compare if it absolutely equal or how many equal it is关于检查用户硬件并比较它是否绝对相等或相等的表

Answer 1

I have re-created your case here locally with the following sample data model:我使用以下示例数据模型在本地重新创建了您的案例：

CREATE TABLE `cars` (
  `id` int(11) NOT NULL,
  `val1` varchar(45) DEFAULT NULL,
  `val2` varchar(45) DEFAULT NULL,
  `val3` varchar(45) DEFAULT NULL,
  `val4` varchar(45) DEFAULT NULL,
  `val5` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;


INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('1', 'Mercedes', 'E', 'black', '250hp', '270kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('2', 'Mercedes', 'C', 'white', '250hp', '240kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('3', 'BMW', 'C', 'white', '250hp', '250kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('4', 'PORSCHE', 'E', 'red', '300hp', '270kmh');

To get your "similarity votes" you could something like this:要获得“相似投票”，您可以这样做：

select id, count(*) as votes from
(
    select id from cars where val1 = 'PORSCHE'
    union all
    select id from cars where val2 = 'E'
    union all 
    select id from cars where val3 = 'red'
    union all
    select id from cars where val4 = '250hp'
    union all
    select id from cars where val5 = '270kmh'
) as votes
group by id

With your test data this generates something like this:使用您的测试数据，这会生成如下内容：

id  votes
1   3
2   1
3   1
4   4

Now comes the tricky part: We only want the "maximal votes" (best fit).现在是棘手的部分：我们只想要“最大票数”（最适合）。 The challenge here is that we need to have this votes query above twice: Once to extract the maximum, and the second time for determining the id s associated to the maximal votes.这里的挑战是，我们需要在上面进行两次投票查询：一次提取最大值，第二次确定与最大投票数相关联的id 。 If you only wanted the "first best match" you could use an order by votes desc limit 1 .如果您只想要“第一个最佳匹配”，您可以使用order by votes desc limit 1 。 If you want to get "all ids which have the highest votes", then you could do something like:如果您想获得“所有票数最高的 id”，那么您可以执行以下操作：

select * from (
    select id, count(*) as votes from
    (
        select id from cars where val1 = 'PORSCHE'
        union all
        select id from cars where val2 = 'E'
        union all 
        select id from cars where val3 = 'red'
        union all
        select id from cars where val4 = '250hp'
        union all
        select id from cars where val5 = '270kmh'
    ) as votes
    group by id
) hits where votes = (
    select max(votes) from (
        select id, count(*) as votes from
        (
            select id from cars where val1 = 'PORSCHE'
            union all
            select id from cars where val2 = 'E'
            union all 
            select id from cars where val3 = 'red'
            union all
            select id from cars where val4 = '250hp'
            union all
            select id from cars where val5 = '270kmh'
        ) as votes
        group by id
    ) as hits
)

Unfortunately, this duplicates the selection query (and also needs to be computed twice).不幸的是，这重复了选择查询（并且还需要计算两次）。 There is large discussion on how to best solve such a problem at SQL select only rows with max value on a column .关于如何在SQL select only rows with max value on a column 中最好地解决此类问题的讨论很多。

In your case I would also consider writing "similarity votes" to a temporary table (if you expect many rows to be compared).在您的情况下，我还会考虑将“相似投票”写入临时表（如果您希望比较多行）。 Whether this is appropriate depends on what kind of database access your application has.这是否合适取决于您的应用程序具有什么样的数据库访问权限。

Answer 2

This is a solution for MySql 8.0+ using rank() window function which will return the most matched rows with ties:这是使用rank()窗口函数的 MySql 8.0+ 的解决方案，它将返回最匹配的行与关系：

with cte as (
  select *,
    rank() over (order by
      (val1 = 'PORSCHE') + 
      (val2 = 'E') + 
      (val3 = 'red') + 
      (val4 = '250hp') + 
      (val5 = '270km') desc 
    ) rn
  from tablename  
) 
select * from cte
where rn = 1

See the demo .请参阅演示。
And this will work for previous versions but will not return ties, just the 1st best match:这适用于以前的版本，但不会返回关系，只是第一个最佳匹配：

select *
from tablename 
order by 
      (val1 = 'PORSCHE') + 
      (val2 = 'E') + 
      (val3 = 'red') + 
      (val4 = '250hp') + 
      (val5 = '270km') desc 
limit 1

See the demo .请参阅演示。

Answer 3

Based on the description, I think this would be the simplest solution:根据描述，我认为这将是最简单的解决方案：

select t.*
from (select t.*,
             rank() over (order by (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5) desc) as seqnum
      from t
     ) t
where seqnum = 1;

In versions of MySQL before version 8, this is a little more complicated, but not that bad:在版本 8 之前的 MySQL 版本中，这有点复杂，但还不错：

select t.*
from t
where (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5)
     ) = (select max( (t2.val1 = @val1) + (t2.val2 = @val2) + (t2.val3 = @val3) + (t2.val4 = @val4) + (t2.val5 = @val5) )
          from t t2
         );

MySQL SELECT 具有最大相似度的行

问题描述

3 个解决方案

解决方案1
0 2020-01-26 11:13:41

解决方案2
0 已采纳 2020-01-26 11:45:20

解决方案3
0 2020-01-26 12:30:20

MySQL SELECT 具有最大相似度的行

问题描述

3 个解决方案

解决方案1 0 2020-01-26 11:13:41

解决方案2 0 已采纳 2020-01-26 11:45:20

解决方案3 0 2020-01-26 12:30:20

解决方案1
0 2020-01-26 11:13:41

解决方案2
0 已采纳 2020-01-26 11:45:20

解决方案3
0 2020-01-26 12:30:20