[英]optimize a 'having count (distinct ) ' query for big data
I have below query that I need to run on a table with 100million records but it's extremellyyy slow (been running for 5 hours so far) I am not sure how to optimize it , would be grateful for any help The table has an index on DID and week_no, and contains several other columns not indexed, and a primary key (id) indexed 我在下面的查询中需要在具有1亿条记录的表上运行,但是它运行得非常慢(到目前为止已运行5个小时),我不确定如何对其进行优化,感谢您的帮助该表在DID上有索引和week_no,并包含其他几个未索引的列,以及一个已索引的主键(id)
DELETE FROM test WHERE "DID" IN (SELECT "DID" FROM test GROUP BY "DID" having count(distinct week_no) < 4 );
thanks! 谢谢!
This would be most efficiently written using a DELETE
with JOIN
(or USING
for PostgreSQL) to avoid having to compute the counts for each row: 这将使用最有效地写入
DELETE
与JOIN
(或USING
PostgreSQL的),以避免必须的每一行计算的计数:
For PostgreSQL: 对于PostgreSQL:
DELETE
FROM test t1
USING (SELECT did, COUNT(DISTINCT week_no) AS num_weeks
FROM test
GROUP BY did) t2
WHERE t2.did = t1.did AND num_weeks < 4
In MySQL: 在MySQL中:
DELETE t1
FROM test t1
JOIN (SELECT did, COUNT(DISTINCT week_no) AS num_weeks
FROM test
GROUP BY did) t2 ON t2.did = t1.did
WHERE num_weeks < 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.