简体   繁体   English

优化大数据的“具有计数(不同)”查询

[英]optimize a 'having count (distinct ) ' query for big data

I have below query that I need to run on a table with 100million records but it's extremellyyy slow (been running for 5 hours so far) I am not sure how to optimize it , would be grateful for any help The table has an index on DID and week_no, and contains several other columns not indexed, and a primary key (id) indexed 我在下面的查询中需要在具有1亿条记录的表上运行,但是它运行得非常慢(到目前为止已运行5个小时),我不确定如何对其进行优化,感谢您的帮助该表在DID上有索引和week_no,并包含其他几个未索引的列,以及一个已索引的主键(id)

 DELETE FROM test WHERE "DID" IN (SELECT "DID" FROM test GROUP BY "DID" having count(distinct week_no) < 4 );

thanks! 谢谢!

This would be most efficiently written using a DELETE with JOIN (or USING for PostgreSQL) to avoid having to compute the counts for each row: 这将使用最有效地写入DELETEJOIN (或USING PostgreSQL的),以避免必须的每一行计算的计数:

For PostgreSQL: 对于PostgreSQL:

DELETE
FROM test t1
USING (SELECT did, COUNT(DISTINCT week_no) AS num_weeks
       FROM test
       GROUP BY did) t2
WHERE t2.did = t1.did AND num_weeks < 4

Demo on dbfiddle dbfiddle上的演示

In MySQL: 在MySQL中:

DELETE t1
FROM test t1
JOIN (SELECT did, COUNT(DISTINCT week_no) AS num_weeks
      FROM test
      GROUP BY did) t2 ON t2.did = t1.did
WHERE num_weeks < 4

Demo on dbfiddle dbfiddle上的演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM