[英]MySQL SELECT row with maximal similarity
Im have table like:我有这样的表:
id | val1 | val2 | val3 | val4 | val5
How to find row(s) with maximum count equal vals (not all vals have to be equal).如何找到最大计数等于 vals 的行(并非所有 vals 都必须相等)。
Example: says I have 4 rows in table:示例:说我在表中有 4 行:
1 | Mercedes | E | black | 250hp | 270kmh |
2 | Mercedes | C | white | 250hp | 240kmh |
3 | BMW | C | white | 250hp | 250kmh |
4 | PORCHE | E | red | 300hp | 290kmh |
I select:我选择:
val1=PORCHE val2=E val3=red val4=250 val5=270
and get:并得到:
1 | Mercedes | E | black | 250hp | 270kmh |
4 | PORCHE | E | red | 300hp | 290kmh |
because both have 3 equal fields.因为两者都有 3 个相等的字段。
Also question not about cars and I hope make this with one table.也不要问汽车,我希望用一张桌子来做这个。
That table about checking hardware of user and compare if it absolutely equal or how many equal it is关于检查用户硬件并比较它是否绝对相等或相等的表
I have re-created your case here locally with the following sample data model:我使用以下示例数据模型在本地重新创建了您的案例:
CREATE TABLE `cars` (
`id` int(11) NOT NULL,
`val1` varchar(45) DEFAULT NULL,
`val2` varchar(45) DEFAULT NULL,
`val3` varchar(45) DEFAULT NULL,
`val4` varchar(45) DEFAULT NULL,
`val5` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('1', 'Mercedes', 'E', 'black', '250hp', '270kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('2', 'Mercedes', 'C', 'white', '250hp', '240kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('3', 'BMW', 'C', 'white', '250hp', '250kmh');
INSERT INTO `cars` (`id`, `val1`, `val2`, `val3`, `val4`, `val5`) VALUES ('4', 'PORSCHE', 'E', 'red', '300hp', '270kmh');
To get your "similarity votes" you could something like this:要获得“相似投票”,您可以这样做:
select id, count(*) as votes from
(
select id from cars where val1 = 'PORSCHE'
union all
select id from cars where val2 = 'E'
union all
select id from cars where val3 = 'red'
union all
select id from cars where val4 = '250hp'
union all
select id from cars where val5 = '270kmh'
) as votes
group by id
With your test data this generates something like this:使用您的测试数据,这会生成如下内容:
id votes
1 3
2 1
3 1
4 4
Now comes the tricky part: We only want the "maximal votes" (best fit).现在是棘手的部分:我们只想要“最大票数”(最适合)。 The challenge here is that we need to have this votes query above twice: Once to extract the maximum, and the second time for determining the
id
s associated to the maximal votes.这里的挑战是,我们需要在上面进行两次投票查询:一次提取最大值,第二次确定与最大投票数相关联的
id
。 If you only wanted the "first best match" you could use an order by votes desc limit 1
.如果您只想要“第一个最佳匹配”,您可以使用
order by votes desc limit 1
。 If you want to get "all ids which have the highest votes", then you could do something like:如果您想获得“所有票数最高的 id”,那么您可以执行以下操作:
select * from (
select id, count(*) as votes from
(
select id from cars where val1 = 'PORSCHE'
union all
select id from cars where val2 = 'E'
union all
select id from cars where val3 = 'red'
union all
select id from cars where val4 = '250hp'
union all
select id from cars where val5 = '270kmh'
) as votes
group by id
) hits where votes = (
select max(votes) from (
select id, count(*) as votes from
(
select id from cars where val1 = 'PORSCHE'
union all
select id from cars where val2 = 'E'
union all
select id from cars where val3 = 'red'
union all
select id from cars where val4 = '250hp'
union all
select id from cars where val5 = '270kmh'
) as votes
group by id
) as hits
)
Unfortunately, this duplicates the selection query (and also needs to be computed twice).不幸的是,这重复了选择查询(并且还需要计算两次)。 There is large discussion on how to best solve such a problem at SQL select only rows with max value on a column .
关于如何在SQL select only rows with max value on a column 中最好地解决此类问题的讨论很多。
In your case I would also consider writing "similarity votes" to a temporary table (if you expect many rows to be compared).在您的情况下,我还会考虑将“相似投票”写入临时表(如果您希望比较多行)。 Whether this is appropriate depends on what kind of database access your application has.
这是否合适取决于您的应用程序具有什么样的数据库访问权限。
This is a solution for MySql 8.0+ using rank()
window function which will return the most matched rows with ties:这是使用
rank()
窗口函数的 MySql 8.0+ 的解决方案,它将返回最匹配的行与关系:
with cte as (
select *,
rank() over (order by
(val1 = 'PORSCHE') +
(val2 = 'E') +
(val3 = 'red') +
(val4 = '250hp') +
(val5 = '270km') desc
) rn
from tablename
)
select * from cte
where rn = 1
See the demo .请参阅演示。
And this will work for previous versions but will not return ties, just the 1st best match:这适用于以前的版本,但不会返回关系,只是第一个最佳匹配:
select *
from tablename
order by
(val1 = 'PORSCHE') +
(val2 = 'E') +
(val3 = 'red') +
(val4 = '250hp') +
(val5 = '270km') desc
limit 1
Based on the description, I think this would be the simplest solution:根据描述,我认为这将是最简单的解决方案:
select t.*
from (select t.*,
rank() over (order by (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5) desc) as seqnum
from t
) t
where seqnum = 1;
In versions of MySQL before version 8, this is a little more complicated, but not that bad:在版本 8 之前的 MySQL 版本中,这有点复杂,但还不错:
select t.*
from t
where (t.val1 = @val1) + (t.val2 = @val2) + (t.val3 = @val3) + (t.val4 = @val4) + (t.val5 = @val5)
) = (select max( (t2.val1 = @val1) + (t2.val2 = @val2) + (t2.val3 = @val3) + (t2.val4 = @val4) + (t2.val5 = @val5) )
from t t2
);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.