分组并选择最大对SQL

Question

I have a table which has two columns. 我有一个有两列的表。

create table txns( 
  person varchar(255),
  fruit varchar(255)
  );

This is a log table. 这是一个日志表。 I have sqlfiddle here . 我这里有sqlfiddle。

This is as far as I am able to get with the sql query. 这是我能用sql查询得到的。

In essence, For every person, which is the most frequent fruit he has eaten. 从本质上讲，对于每个人来说，这是他吃过的最常见的水果。

I have both Oracle and MySql at my place. 我在我的地方有Oracle和MySql。 In the future, it would also be deployed on hadoop (via Hive/Impala etc). 将来，它也将部署在hadoop上（通过Hive / Impala等）。 Thus a non-db centric answer would be best. 因此，非数据库中心的答案是最好的。 But pls also do provide a db centric answer if there is such only. 但是，如果只有这样的话，请确实提供以数据库为中心的答案。

Answer 1

SQL Fiddle SQL小提琴

Oracle 11g R2 Schema Setup : Oracle 11g R2架构设置 ：

create table txns(

  person varchar(255),
  fruit varchar(255)
  );

insert into txns
values ('alpha','apple');

insert into txns
values ('charlie','cherry');

insert into txns
values ('bravo','banana');

insert into txns
values ('alpha','apple');

insert into txns
values ('bravo','banana');

insert into txns
values ('alpha','apricot');

insert into txns
values ('bravo','berry');

Query 1 : 查询1 ：

with tab as (
select person, fruit,count(1) cnt,  
       max(count(1)) over (partition by person) m_cnt
  from txns
 group by person, fruit)
select person, fruit, cnt, m_cnt
  from tab
 where cnt = m_cnt

Results : 结果：

|  PERSON |  FRUIT | CNT | M_CNT |
|---------|--------|-----|-------|
|   alpha |  apple |   2 |     2 |
|   bravo | banana |   2 |     2 |
| charlie | cherry |   1 |     1 |

Answer 2

For Oracle - 对于甲骨文 -

select x.person,x.fruit
from ( select person, fruit, count(*) ct,rank() over (partition by person 
                                                      order by count(*) desc) as rank
         from txns
group by person, fruit) x
where rank=1;

SQL Fiddle SQL小提琴

The "non DB centric" idea is- “非以数据库为中心”的想法是 -

You first find out for each person, how many time a particular fruit appears in the table (or how many time the fruit was eaten). 你首先要找出每个人，特定水果出现在桌子上的时间（或吃水果的时间）。 This is done by person, fruit,count(*) . 这是由person, fruit,count(*) 。
Then you need to find out which fruit was most eaten by the PERSON ie essentially the fruit that holds the highest RANK or position, in this case a DESC ending order of count(*) would place the most eaten fruit in RANK =1 for each person ( partition by person ). 然后你需要找出PERSON吃得最多的fruit ，即基本上具有最高RANK或位置的fruit ，在这种情况下， DESC结束的count(*)顺序count(*)会将吃得最多的水果放入RANK = 1人（ partition by person ）。
Once you are done ranking, you just need to select the first RANK for each Person which would essentially be the most eaten fruit by the person. 一旦你完成排名，你只需要为每个Person选择第一个RANK ，这个人基本上是这个人吃得最多的水果。

This is a perfect example of Oracle's analytical function RANK() . 这是Oracle的分析函数RANK()的完美示例。

Why use `RANK` , you ask? 为什么要使用`RANK` ？

Your boss may change his mind & may ask you "Hey prog_guy, I changed my mind, I not only want the most eaten fruit by the fruit eater, I also want the 3rd most eaten fruit as well". 你的老板可能会改变主意并且可能会问你“嘿prog_guy，我改变了主意，我不仅想要吃水果最吃的水果，我也想吃第3个最吃的水果”。 What do you do? 你是做什么？ Scramble to write another query? 争抢写另一个查询？ No, you take the same query and change rank=1 to rank in (1,3) and BAM! 不，您采用相同的查询并将rank=1更改为rank in (1,3)和BAM中的rank in (1,3) ！ you now have the most eaten fruit and the third most eaten fruit (if any) by the fruit eaters. 你现在吃水果最多的水果和第三大吃水果（如果有的话）。

AND / OR 和/或

Your boss may change his mind again and say "Hey prog_guy, forget about most eaten fruits, now I want you to get the least eaten fruit" What do you do? 你的老板可能会再次改变主意并说“嘿prog_guy，忘掉吃过的水果，现在我想要你吃得最少的水果”你做什么？ Scram again for a new query? 再次Scram一个新的查询？ Nope! 不！ You change the desc to asc in the partition by and BAM! 您将desc更改为partition by asc和BAM！ you now have the least eaten fruit by the fruit eaters. 你现在吃的水果最少吃水果。

Some detail about RANK() equivalent of Oracle in MySQL here . 一些细节有关RANK()在MySQL甲骨文相当于这里。 A little bit of info related to RANK() equivalent in Hive, here . 信息的一点点有关RANK()相当于蜂巢，在这里。

Answer 3

Following query would run both in Oracle and MySQL. 以下查询将在Oracle和MySQL中运行。

select k.person, k.fruit from
(
  select person,fruit,count(fruit) as cnt
  from txns
  group by person,fruit
) k
join
(
  select t.person,max(t.cnt) mxCnt
  from
  (
    select person,fruit,count(fruit) as cnt
    from txns
    group by person,fruit
  )t
group by t.person
) s
on s.person = k.person
and s.mxCnt = k.cnt 
order by k.person

分组并选择最大对SQL

问题描述

3 个解决方案

解决方案1
1 2014-02-12 06:54:56

解决方案2
1 2014-02-12 06:57:32

The "non DB centric" idea is- “非以数据库为中心”的想法是 -

Why use `RANK` , you ask? 为什么要使用`RANK` ？

解决方案3
1 已采纳 2014-02-12 07:05:22

分组并选择最大对SQL

问题描述

3 个解决方案

解决方案1 1 2014-02-12 06:54:56

解决方案2 1 2014-02-12 06:57:32

The "non DB centric" idea is- “非以数据库为中心”的想法是 -

Why use RANK , you ask? 为什么要使用RANK ？

解决方案3 1 已采纳 2014-02-12 07:05:22

解决方案1
1 2014-02-12 06:54:56

解决方案2
1 2014-02-12 06:57:32

Why use `RANK` , you ask? 为什么要使用`RANK` ？

解决方案3
1 已采纳 2014-02-12 07:05:22