[英]Group and choose max pair SQL
I have a table which has two columns. 我有一个有两列的表。
create table txns(
person varchar(255),
fruit varchar(255)
);
This is a log table. 这是一个日志表。 I have sqlfiddle here .
我这里有sqlfiddle。
This is as far as I am able to get with the sql query. 这是我能用sql查询得到的。
In essence, For every person, which is the most frequent fruit he has eaten. 从本质上讲,对于每个人来说,这是他吃过的最常见的水果。
I have both Oracle and MySql at my place. 我在我的地方有Oracle和MySql。 In the future, it would also be deployed on hadoop (via Hive/Impala etc).
将来,它也将部署在hadoop上(通过Hive / Impala等)。 Thus a non-db centric answer would be best.
因此,非数据库中心的答案是最好的。 But pls also do provide a db centric answer if there is such only.
但是,如果只有这样的话,请确实提供以数据库为中心的答案。
Oracle 11g R2 Schema Setup : Oracle 11g R2架构设置 :
create table txns(
person varchar(255),
fruit varchar(255)
);
insert into txns
values ('alpha','apple');
insert into txns
values ('charlie','cherry');
insert into txns
values ('bravo','banana');
insert into txns
values ('alpha','apple');
insert into txns
values ('bravo','banana');
insert into txns
values ('alpha','apricot');
insert into txns
values ('bravo','berry');
Query 1 : 查询1 :
with tab as (
select person, fruit,count(1) cnt,
max(count(1)) over (partition by person) m_cnt
from txns
group by person, fruit)
select person, fruit, cnt, m_cnt
from tab
where cnt = m_cnt
| PERSON | FRUIT | CNT | M_CNT |
|---------|--------|-----|-------|
| alpha | apple | 2 | 2 |
| bravo | banana | 2 | 2 |
| charlie | cherry | 1 | 1 |
For Oracle - 对于甲骨文 -
select x.person,x.fruit
from ( select person, fruit, count(*) ct,rank() over (partition by person
order by count(*) desc) as rank
from txns
group by person, fruit) x
where rank=1;
person, fruit,count(*)
. person, fruit,count(*)
。 PERSON
ie essentially the fruit
that holds the highest RANK
or position, in this case a DESC
ending order of count(*)
would place the most eaten fruit in RANK
=1 for each person ( partition by person
). PERSON
吃得最多的fruit
,即基本上具有最高RANK
或位置的fruit
,在这种情况下, DESC
结束的count(*)
顺序count(*)
会将吃得最多的水果放入RANK
= 1人( partition by person
)。 RANK
for each Person
which would essentially be the most eaten fruit by the person. Person
选择第一个RANK
,这个人基本上是这个人吃得最多的水果。 This is a perfect example of Oracle's analytical function RANK()
. 这是Oracle的分析函数
RANK()
的完美示例。
RANK
, you ask? RANK
? Your boss may change his mind & may ask you "Hey prog_guy, I changed my mind, I not only want the most eaten fruit by the fruit eater, I also want the 3rd most eaten fruit as well". 你的老板可能会改变主意并且可能会问你“嘿prog_guy,我改变了主意,我不仅想要吃水果最吃的水果,我也想吃第3个最吃的水果”。 What do you do?
你是做什么? Scramble to write another query?
争抢写另一个查询? No, you take the same query and change
rank=1
to rank in (1,3)
and BAM! 不,您采用相同的查询并将
rank=1
更改为rank in (1,3)
和BAM中的rank in (1,3)
! you now have the most eaten fruit and the third most eaten fruit (if any) by the fruit eaters. 你现在吃水果最多的水果和第三大吃水果(如果有的话)。
AND / OR 和/或
Your boss may change his mind again and say "Hey prog_guy, forget about most eaten fruits, now I want you to get the least eaten fruit" What do you do? 你的老板可能会再次改变主意并说“嘿prog_guy,忘掉吃过的水果,现在我想要你吃得最少的水果”你做什么? Scram again for a new query?
再次Scram一个新的查询? Nope!
不! You change the
desc
to asc
in the partition by
and BAM! 您将
desc
更改为partition by
asc
和BAM! you now have the least eaten fruit by the fruit eaters. 你现在吃的水果最少吃水果。
Some detail about RANK()
equivalent of Oracle in MySQL here . 一些细节有关
RANK()
在MySQL甲骨文相当于这里 。 A little bit of info related to RANK()
equivalent in Hive, here . 信息的一点点有关
RANK()
相当于蜂巢, 在这里 。
Following query would run both in Oracle and MySQL. 以下查询将在Oracle和MySQL中运行。
select k.person, k.fruit from
(
select person,fruit,count(fruit) as cnt
from txns
group by person,fruit
) k
join
(
select t.person,max(t.cnt) mxCnt
from
(
select person,fruit,count(fruit) as cnt
from txns
group by person,fruit
)t
group by t.person
) s
on s.person = k.person
and s.mxCnt = k.cnt
order by k.person
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.