简体   繁体   English

MySQL:选择N行,但在一列中仅包含唯一值

[英]MySQL: Select N rows, but with only unique values in one column

Given this data set: 给定此数据集:

ID  Name            City            Birthyear
1   Egon Spengler   New York        1957
2   Mac Taylor      New York        1955
3   Sarah Connor    Los Angeles     1959
4   Jean-Luc Picard La Barre        2305
5   Ellen Ripley    Nostromo        2092
6   James T. Kirk   Riverside       2233
7   Henry Jones     Chicago         1899

I need to find the 3 oldest persons, but only one of every city. 我需要找到3个最老的人,但每个城市只有一个。

If it would just be the three oldest, it would be... 如果它只是最老的三个,那将是...

  • Henry Jones / Chicago 亨利·琼斯/芝加哥
  • Mac Taylor / New York 麦克·泰勒(Mac Taylor)/纽约
  • Egon Spengler / New York 埃贡·斯宾格勒(Egon Spengler)/纽约

However since both Egon Spengler and Mac Taylor are located in New York, Egon Spengler would drop out and the next one (Sarah Connor / Los Angeles) would come in instead. 但是,由于Egon Spengler和Mac Taylor都位于纽约,因此Egon Spengler会退学,而下一个(Sarah Connor /洛杉矶)会进来。

Any elegant solutions? 有什么优雅的解决方案吗?

Update: 更新:

Currently a variation of PConroy is the best/fastest solution: 当前,PConroy的一种变体是最好/最快的解决方案:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

His original query with "IN" is extremly slow with big datasets (aborted after 5 minutes), but moving the subquery to a JOIN will speed it up a lot. 对于大数据集(5分钟后终止),他最初的“ IN”查询速度极慢,但是将子查询移至JOIN可以大大加快速度。 It took about 0.15 seconds for approx. 大约花费了0.15秒。 1 mio rows in my test environment. 在我的测试环境中为1 mio行。 I have an index on "City, Birthyear" and a second one just on "Birthyear". 我有一个关于“城市,出生年份”的索引,另一个是关于“出生年份”的索引。

Note: This is related to... 注意:这与...有关

Probably not the most elegant of solutions, and the performance of IN may suffer on larger tables. 可能不是最优雅的解决方案,并且IN的性能可能在较大的表上受到影响。

The nested query gets the minimum Birthyear for each city. 嵌套查询获取每个城市的最小Birthyear Only records who have this Birthyear are matched in the outer query. 在外部查询中,只有具有该Birthyear记录才匹配。 Ordering by age then limiting to 3 results gets you the 3 oldest people who are also the oldest in their city (Egon Spengler drops out..) 按年龄排序,然后限制为3个结果,即可使您成为城市中年龄最大的3个年龄最大的人(Egon Spengler退学..)

SELECT Name, City, Birthyear, COUNT(*) AS ct
FROM table
WHERE Birthyear IN (SELECT MIN(Birthyear)
               FROM table
               GROUP by City)
GROUP BY City
ORDER BY Birthyear DESC LIMIT 3;

+-----------------+-------------+------+----+
| name            | city        | year | ct |
+-----------------+-------------+------+----+
| Henry Jones     | Chicago     | 1899 | 1  |
| Mac Taylor      | New York    | 1955 | 1  |
| Sarah Connor    | Los Angeles | 1959 | 1  |
+-----------------+-------------+------+----+

Edit - added GROUP BY City to outer query, as people with same birth years would return multiple values. 编辑 -在外部查询中添加了GROUP BY City ,因为出生年份相同的人将返回多个值。 Grouping on the outer query ensures that only one result will be returned per city, if more than one person has that minimum Birthyear . 对外部查询进行分组可确保每个城市仅返回一个结果,如果一个以上的最小Birthyear不止一个人。 The ct column will show if more than one person exists in the city with that Birthyear ct栏将显示该Birthyear所在城市中是否有一个以上的人

This is probably not the most elegant and quickest solution, but it should work. 这可能不是最优雅,最快的解决方案,但它应该可以工作。 I am looking forward the see the solutions of real database gurus. 我期待看到真正的数据库专家的解决方案。

select p.* from people p,
(select city, max(age) as mage from people group by city) t
where p.city = t.city and p.age = t.mage
order by p.age desc

Something like that? 这样的事吗?

SELECT
  Id, Name, City, Birthyear
FROM
  TheTable
WHERE
  Id IN (SELECT TOP 1 Id FROM TheTable i WHERE i.City = TheTable.City ORDER BY Birthyear)

Not pretty but should work also with multiple people with the same dob: 不太漂亮,但也应该与具有相同dob的多个人一起工作:

Test data: 测试数据:

select id, name, city, dob 
into people
from
(select 1 id,'Egon Spengler' name, 'New York' city , 1957 dob
union all select 2, 'Mac Taylor','New York', 1955
union all select 3, 'Sarah Connor','Los Angeles', 1959
union all select 4, 'Jean-Luc Picard','La Barre', 2305
union all select 5, 'Ellen Ripley','Nostromo', 2092
union all select 6, 'James T. Kirk','Riverside', 2233
union all select 7, 'Henry Jones','Chicago', 1899
union all select 8, 'Blah','New York', 1955) a

Query: 查询:

select 
    * 
from 
    people p
    left join people p1
    ON 
        p.city = p1.city
        and (p.dob > p1.dob and p.id <> p1.id)
        or (p.dob = p1.dob and p.id > p1.id)
where
    p1.id is null
order by 
    p.dob

@BlaM @布莱姆

UPDATED just found that its good to use USING instead of ON. UPDATED刚发现使用USING代替ON很好。 it will remove duplicate columns in result. 它将删除结果中的重复列。

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 USING(Birthyear, City)
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

ORIGINAL POST 原始邮件

hi, i've tried to use your updated query but i was getting wrong results until i've added extra condition to join (also extra column into join select). 嗨,我试图使用您更新的查询,但我得到错误的结果,直到我添加了要加入的额外条件(也在join select中添加了额外的列)。 transfered to your query, i'am using this: 转移到您的查询,我正在使用此:

SELECT P.*, COUNT(*) AS ct
   FROM people P
   JOIN (SELECT City, MIN(Birthyear) AS Birthyear
              FROM people 
              GROUP by City) P2 ON P2.Birthyear = P.Birthyear AND P2.City = P.City
   GROUP BY P.City
   ORDER BY P.Birthyear ASC 
   LIMIT 10;

in theory you should not need last GROUP BY P.City, but i've left it there for now, just in case. 从理论上讲,您不需要最后一个GROUP BY P.City,但是为了防止万一,我暂时将其保留在那里。 will probably remove it later. 稍后可能会删除它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM