简体   繁体   中英

SQL Query for X Number of Distinct Column Values, All Rows

Is it possible to write a SQL query that will give me X number of distinct values for a column that gives me each row holding that column value?

For example, say I have rows with the following DocIDs

1,
2,
3,
4,
5,
3

I want to return the top 4 distinct DocIDs

Therefore I should get back

1,
2,
3,
4,
3

I want the top 4 distinct DocIDs where I get every row containing each of those DocIDs.

Is such a query possible?

EDIT: This is for a client who is using MySQL. Also, using an "order by" clause caused an error with the database log being too large.

Thank you

With Common Table Expressions and Windows Functions you could do it using DENSE_RANK. Note in all of the examples you have to choose how you want to order your distinct values to choose the top X. In the case above it looked like you ordered by the integer.

Also while this code is sql server table variable the concept works with most modern RDBMS systems that support cte and windowed functions

DECLARE @Table AS TABLE (I INT)
INSERT INTO @Table VALUES (1),(2),(3),(4),(5),(4)

;WITH cte AS (
    SELECT
       *
       ,DENSE_RANK() OVER (ORDER BY I) as Ranking
    FROM
       @Table
)

SELECT *
FROM
    cte
WHERE
    Ranking <= 4

OR you could go more older school with a group by to get distinct values and join back

;WITH cte AS (
    SELECT TOP 4 I
    FROM
       @Table
    GROUP BY
       I
    ORDER BY
       I
)

SELECT
    t.I
FROM
    cte c
    INNER JOIN @Table t
    ON c.I = t.I    

And here is how it might look without Windowed Functions or Common Table Expressions:

SELECT
    t.I
FROM
    (    SELECT TOP 4 I
    FROM
       @Table
    GROUP BY
       I
    ORDER BY
       I) c
    INNER JOIN @Table t
    ON c.I = t.I    

This would often be solved by doing:

select t.*
from t
where t.docid in (select t2.docid
                  from t t2
                  group by t2.docid
                  order by t2.docid
                  fetch first 4 rows only
                 );

Notes:

  • You don't specify how "top 4" is chosen. This is based on the numerical ordering.
  • This uses group by rather than count distinct , so the ordering can be based on something else (say number of rows or timing of rows).
  • fetch first 4 rows only is standard SQL. Some databases use limit or top instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM