简体   繁体   English

替代使用GROUP BY而不使用聚合来检索不同的“最佳”结果

[英]Alternative to using GROUP BY without aggregates to retrieve distinct “best” result

I'm trying to retrieve the "Best" possible entry from an SQL table. 我正在尝试从SQL表中检索“最佳”可能的条目。

Consider a table containing tv shows: id, title, episode, is_hidef, is_verified eg: 考虑一个包含电视节目的表:id,title,episode,is_hidef,is_verified例如:

id title         ep hidef verified
1  The Simpsons  1  True  False
2  The Simpsons  1  True  True
3  The Simpsons  1  True  True
4  The Simpsons  2  False False
5  The Simpsons  2  True  False

There may be duplicate rows for a single title and episode which may or may not have different values for the boolean fields. 对于单个标题和剧集,可能存在重复的行,其可能具有或不具有布尔字段的不同值。 There may be more columns containing additional info, but thats unimportant. 可能有更多列包含其他信息,但这并不重要。

I want a result set that gives me the best row (so is_hidef and is_verified are both "true" where possible) for each episode. 我想要一个结果集,为每个剧集提供最好的行(所以is_hidef和is_verified都是“真实的”)。 For rows considered "equal" I want the most recent row (natural ordering, or order by an abitrary datetime column). 对于被视为“相等”的行,我想要最近的行(自然顺序,或者是abitrary datetime列的顺序)。

3  The Simpsons  1  True  True
5  The Simpsons  2  True  False

In the past I would have used the following query: 在过去,我会使用以下查询:

SELECT * FROM shows WHERE title='The Simpsons' GROUP BY episode ORDER BY is_hidef, is_verified

This works under MySQL and SQLite, but goes against the SQL spec (GROUP BY requiring aggragates etc etc). 这适用于MySQL和SQLite,但违反SQL规范(GROUP BY要求恶化等)。 I'm not really interested in hearing again why MySQL is so bad for allowing this; 我真的没有兴趣再次听到为什么MySQL如此糟糕,允许这样做; but I'm very interested in finding an alternative solution that will work on other engines too (bonus points if you can give me the django ORM code for it). 但是我非常有兴趣找到一种可以在其他引擎上工作的替代解决方案(如果你能给我django ORM代码的奖励积分)。

Thanks =) 谢谢=)

In some way similar to Andomar's but this one really works. 在某种程度上类似于Andomar的,但这个确实有效。

select C.*
FROM
(
    select min(ID) minid
    from (
        select distinct title, ep, max(hidef*1 + verified*1) ord
        from tbl
        group by title, ep) a
    inner join tbl b on b.title=a.title and b.ep=a.ep and b.hidef*1 + b.verified*1 = a.ord
    group by a.title, a.ep, a.ord
) D inner join tbl C on D.minid = C.id

The first level tiebreak converts bits (SQL Server) or MySQL boolean to an integer value using *1, and the columns are added to produce the "best" value. 第一级tiebreak使用* 1将位(SQL Server)或MySQL boolean转换为整数值,并添加列以生成“最佳”值。 You can give them weights, eg if hidef > verified, then use hidef*2 + verified*1 which can produce 3,2,1 or 0. 您可以给它们权重,例如,如果hidef>已验证,则使用hidef * 2 + Verified * 1 ,它可以产生3,2,1或0。

The 2nd level looks among those of the "best" scenario and extracts the minimum ID (or some other tie-break column). 第二级看起来是“最佳”场景的那些,并提取最小ID(或其他一些打破平局列)。 This is essential to reduce a multi-match result set to just one record. 这对于将多匹配结果集减少到只有一条记录至关重要。

In this particular case (table schema), the outer select uses the direct key to retrieve the matched records. 在此特定情况(表模式)中,外部选择使用直接键来检索匹配的记录。

This is basically a form of the groupwise-maximum-with-ties problem . 这基本上是分组最大与关系问题的一种形式。 I don't think there is a SQL standard compliant solution. 我认为没有符合SQL标准的解决方案。 A solution like this would perform nicely: 像这样的解决方案可以很好地执行:

SELECT  s2.id
,       s2.title
,       s2.episode
,       s2.is_hidef
,       s2.is_verified
FROM    (
        select  distinct title
        ,       episode
        from    shows
        where   title = 'The Simpsons' 
        ) s1
JOIN    shows s2
ON      s2.id = 
        (
        select  id
        from    shows s3
        where   s3.title = s1.title
                and s3.episode = s1.episode
        order by
                s3.is_hidef DESC
        ,       s3.is_verified DESC
        limit   1
        )

But given the cost of readability, I would stick with your original query. 但考虑到可读性的成本,我会坚持你的原始查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM