简体   繁体   English

MySQL SELECT 具有最大相似度的行

[英]MySQL SELECT row with maximal similarity

Im have table like:我有这样的表:

id | val1 | val2 | val3 | val4 | val5

How to find row(s) with maximum count equal vals (not all vals have to be equal).如何找到最大计数等于 vals 的行(并非所有 vals 都必须相等)。
Example: says I have 4 rows in table:示例:说我在表中有 4 行:

1 | Mercedes | E | black | 250hp | 270kmh | 
2 | Mercedes | C | white | 250hp | 240kmh | 
3 | BMW      | C | white | 250hp | 250kmh | 
4 | PORCHE   | E | red   | 300hp | 290kmh | 

I select:我选择:

val1=PORCHE val2=E val3=red val4=250 val5=270 

and get:并得到:

1 | Mercedes | E | black | 250hp | 270kmh | 
4 | PORCHE   | E | red   | 300hp | 290kmh | 

because both have 3 equal fields.因为两者都有 3 个相等的字段。
Also question not about cars and I hope make this with one table.也不要问汽车,我希望用一张桌子来做这个。
That table about checking hardware of user and compare if it absolutely equal or how many equal it is关于检查用户硬件并比较它是否绝对相等或相等的表

I have re-created your case here locally with the following sample data model:我使用以下示例数据模型在本地重新创建了您的案例:

CREATE TABLE `cars` (
  `id` int(11) NOT NULL,
  `val1` varchar(45) DEFAULT NULL,
  `val2` varchar(45) DEFAULT NULL,
  `val3` varchar(45) DEFAULT NULL,
  `val4` varchar(45) DEFAULT NULL,
  `val5` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;


INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('1', 'Mercedes', 'E', 'black', '250hp', '270kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('2', 'Mercedes', 'C', 'white', '250hp', '240kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('3', 'BMW', 'C', 'white', '250hp', '250kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('4', 'PORSCHE', 'E', 'red', '300hp', '270kmh');

To get your "similarity votes" you could something like this:要获得“相似投票”,您可以这样做:

select id, count(*) as votes from
(
    select id from cars where val1 = 'PORSCHE'
    union all
    select id from cars where val2 = 'E'
    union all 
    select id from cars where val3 = 'red'
    union all
    select id from cars where val4 = '250hp'
    union all
    select id from cars where val5 = '270kmh'
) as votes
group by id

With your test data this generates something like this:使用您的测试数据,这会生成如下内容:

id  votes
1   3
2   1
3   1
4   4

Now comes the tricky part: We only want the "maximal votes" (best fit).现在是棘手的部分:我们只想要“最大票数”(最适合)。 The challenge here is that we need to have this votes query above twice: Once to extract the maximum, and the second time for determining the id s associated to the maximal votes.这里的挑战是,我们需要在上面进行两次投票查询:一次提取最大值,第二次确定与最大投票数相关联的id If you only wanted the "first best match" you could use an order by votes desc limit 1 .如果您只想要“第一个最佳匹配”,您可以使用order by votes desc limit 1 If you want to get "all ids which have the highest votes", then you could do something like:如果您想获得“所有票数最高的 id”,那么您可以执行以下操作:

select * from (
    select id, count(*) as votes from
    (
        select id from cars where val1 = 'PORSCHE'
        union all
        select id from cars where val2 = 'E'
        union all 
        select id from cars where val3 = 'red'
        union all
        select id from cars where val4 = '250hp'
        union all
        select id from cars where val5 = '270kmh'
    ) as votes
    group by id
) hits where votes = (
    select max(votes) from (
        select id, count(*) as votes from
        (
            select id from cars where val1 = 'PORSCHE'
            union all
            select id from cars where val2 = 'E'
            union all 
            select id from cars where val3 = 'red'
            union all
            select id from cars where val4 = '250hp'
            union all
            select id from cars where val5 = '270kmh'
        ) as votes
        group by id
    ) as hits
)

Unfortunately, this duplicates the selection query (and also needs to be computed twice).不幸的是,这重复了选择查询(并且还需要计算两次)。 There is large discussion on how to best solve such a problem at SQL select only rows with max value on a column .关于如何在SQL select only rows with max value on a column 中最好地解决此类问题的讨论很多。

In your case I would also consider writing "similarity votes" to a temporary table (if you expect many rows to be compared).在您的情况下,我还会考虑将“相似投票”写入临时表(如果您希望比较多行)。 Whether this is appropriate depends on what kind of database access your application has.这是否合适取决于您的应用程序具有什么样的数据库访问权限。

This is a solution for MySql 8.0+ using rank() window function which will return the most matched rows with ties:这是使用rank()窗口函数的 MySql 8.0+ 的解决方案,它将返回最匹配的行与关系:

with cte as (
  select *,
    rank() over (order by
      (val1 = 'PORSCHE') + 
      (val2 = 'E') + 
      (val3 = 'red') + 
      (val4 = '250hp') + 
      (val5 = '270km') desc 
    ) rn
  from tablename  
) 
select * from cte
where rn = 1

See the demo .请参阅演示
And this will work for previous versions but will not return ties, just the 1st best match:这适用于以前的版本,但不会返回关系,只是第一个最佳匹配:

select *
from tablename 
order by 
      (val1 = 'PORSCHE') + 
      (val2 = 'E') + 
      (val3 = 'red') + 
      (val4 = '250hp') + 
      (val5 = '270km') desc 
limit 1

See the demo .请参阅演示

Based on the description, I think this would be the simplest solution:根据描述,我认为这将是最简单的解决方案:

select t.*
from (select t.*,
             rank() over (order by (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5) desc) as seqnum
      from t
     ) t
where seqnum = 1;

In versions of MySQL before version 8, this is a little more complicated, but not that bad:在版本 8 之前的 MySQL 版本中,这有点复杂,但还不错:

select t.*
from t
where (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5)
     ) = (select max( (t2.val1 = @val1) + (t2.val2 = @val2) + (t2.val3 = @val3) + (t2.val4 = @val4) + (t2.val5 = @val5) )
          from t t2
         );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM