简体   繁体   中英

speed up delete duplicates command on MySQL

I have a table t1 like this:

id     |   abstract_text |
1      | long paragraph1 |
2      | long paragraph2 |
3      | long paragraph1 |

It has around 150,000 unique id's, but some id's have the same abstract_text value (like 1 and 3).

I'm using this command

delete t1 from t1 
inner join t1 t2 
where
    t1.application_id < t2.application_id AND
    t1.abstract_text=t2.abstract_text;

However, it's been over 2 hours and it hasn't finished running. The abstract_texts are long paragraphs, so I know it won't be fast. I tried creating an index but I'm not sure how to use that as I can't create a index with the abstract_text b/c it's too long (throws ER_TOO_LONG_KEY: Specified key was too long; max key length is 3072 bytes error).

Any ways to speed up this process?

Delete manipulation is a costly process for databases, you can prefer creating a new table in which the duplicate paragraphs are removed in such a way that

CREATE TABLE t3 AS
SELECT t1.application_id, t1.abstract_text
  FROM t1 
  JOIN t1 t2
 WHERE t1.application_id > t2.application_id 
   AND t1.abstract_text=t2.abstract_text
UNION ALL
SELECT MAX(application_id) AS application_id, abstract_text
  FROM t1 
 GROUP BY abstract_text
 HAVING COUNT(*)=1

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM