简体   繁体   English

Select 行在大表中具有唯一值的特定列

[英]Select rows with unique values for one specific column in large table

table1 has 3 columns in my database: id , timestamp , cluster and it has about 1M rows. table1在我的数据库中有 3 列: idtimestampcluster ,它有大约 1M 行。 I want to query the newest 24 rows with unique cluster value (no row must have repeated cluster value in the returned 24 rows).我想用唯一的集群值查询最新的 24 行(返回的 24 行中没有行必须有重复的集群值)。 the usual solution would be:通常的解决方案是:

SELECT
    *
FROM table1
GROUP BY cluster
ORDER BY timestamp DESC
LIMIT 24

however, since I have 1M rows, this query takes so long to be executed.但是,由于我有 1M 行,因此执行此查询需要很长时间。 so my solution was to run:所以我的解决方案是运行:

WITH x AS
(
    SELECT
        *
    FROM `table1`
    ORDER BY timestamp DESC
    LIMIT 50
)
SELECT
    *
FROM x
GROUP BY x.cluster
ORDER BY x.timestamp DESC
LIMIT 24

which assumes we can find 24 rows with unique cluster value in every 50 rows.假设我们可以在每 50 行中找到 24 行具有唯一聚类值的行。 this query runs much faster (~.007 sec).这个查询运行得更快(~.007 秒)。 now I want to ask is there any more efficient/routine way for such case?现在我想问这种情况有没有更有效/常规的方法?

Your assumption that in the last 50 rows you will find 24 different clusters may not be correct.您假设在最后 50 行中您会发现 24 个不同的集群可能不正确。

Try with ROW_NUMBER() window function:尝试使用ROW_NUMBER() window function:

SELECT *
FROM (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY cluster ORDER BY timestamp DESC) rn
  FROM table1
) t
WHERE rn = 1
ORDER BY timestamp DESC LIMIT 24

You can use row_number() , but you need the right indexes:您可以使用row_number() ,但您需要正确的索引:

select t.*
from (select t.*,
             row_number() over (partition by cluster order by timestamp desc) as seqnum
      from t
     ) t
where seqnum = 1
order by timestamp desc
limit 24;

The index you want is on (cluster, timestamp desc) .您想要的索引在(cluster, timestamp desc)上。

For your purposes, this may still not be sufficient because it is still processing all the rows, even with an index, when you only need a couple of dozen.出于您的目的,这可能仍然不够,因为当您只需要几十个时,它仍在处理所有行,即使使用索引也是如此。

I don't know how many recent rows you need to be sure that you have 24 clusters.我不知道你需要多少最近的行来确保你有 24 个集群。 However, you might find that this works better if we assume that the most recent 1000 rows have at least 24 clusters:但是,如果我们假设最近的 1000 行至少有 24 个集群,您可能会发现这会更好:

select t.*
from (select t.*,
             row_number() over (partition by cluster order by timestamp desc) as seqnum
      from (select t.*
            from t
            order by timestamp desc
            limit 1000
           ) t
     ) t
where seqnum = 1
order by timestamp desc
limit 24;

For this, you want an index only on (timestamp desc) .为此,您只需要(timestamp desc)上的索引。

Note: You might find that a where clause on the timestamp works better in this case:注意:在这种情况下,您可能会发现时间戳上的where子句效果更好:

where timestamp > now() - interval 24 hour

for instance to only consider rows in the past 24 hours.例如,仅考虑过去 24 小时内的行。

Since you want "one specific cluster value", this will be fast:由于您想要“一个特定的集群值”,这将很快:

SELECT
    *
FROM table1
WHERE cluster = ?
ORDER BY timestamp DESC
LIMIT 24

And have并且有

INDEX(cluster, timestamp)

If that is not what you want, please reword the title and the Question.如果这不是您想要的,请改写标题和问题。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 MySQL:选择N行,但在一列中仅包含唯一值 - MySQL: Select N rows, but with only unique values in one column 从另一张表的一列中的多个值中选择唯一的ID - Select unique ids from multi values on one column in another table SQL选择在另一列中具有至少一个特定值的所有非唯一行 - SQL select all non-unique rows that have at least one specific value in another column 如何从具有一个唯一字段和另一字段的特定值的表中提取所有行? - How do I pull all rows from a table with one unique field and specific values for another field? MySQL select 一个表中的所有行匹配另一个表中的唯一值 - MySQL select all rows in one tables matching unique values in another table Laravel select 如果 id 存在于另一个表中并且列具有该表上的特定值,则所有行 - Laravel select all rows if id exists in another table and a column has specific values on that table 如何根据一列中的值将唯一值选择到mysql表中? - How to select just unique values into a mysql table based on the value in one column? 当与另一张表联接时,MySQL SELECT仅在一列中具有唯一值 - Mysql SELECT only unique values in one column when left joined with another table SQL 按一列选择唯一值,按另一列选择最新值 - SQL Select unique values by one column with the latest value by another column MySQL从左连接选择行,在一列中具有特定数量的不同值,但没有总行数限制 - MySQL select rows from left join with specific number of distinct values in one column but without total row limit
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM