I have a table t1 like this:
id | abstract_text |
1 | long paragraph1 |
2 | long paragraph2 |
3 | long paragraph1 |
It has around 150,000 unique id's, but some id's have the same abstract_text value (like 1 and 3).
I'm using this command
delete t1 from t1
inner join t1 t2
where
t1.application_id < t2.application_id AND
t1.abstract_text=t2.abstract_text;
However, it's been over 2 hours and it hasn't finished running. The abstract_texts are long paragraphs, so I know it won't be fast. I tried creating an index but I'm not sure how to use that as I can't create a index with the abstract_text b/c it's too long (throws ER_TOO_LONG_KEY: Specified key was too long; max key length is 3072 bytes
error).
Any ways to speed up this process?
Delete manipulation is a costly process for databases, you can prefer creating a new table in which the duplicate paragraphs are removed in such a way that
CREATE TABLE t3 AS
SELECT t1.application_id, t1.abstract_text
FROM t1
JOIN t1 t2
WHERE t1.application_id > t2.application_id
AND t1.abstract_text=t2.abstract_text
UNION ALL
SELECT MAX(application_id) AS application_id, abstract_text
FROM t1
GROUP BY abstract_text
HAVING COUNT(*)=1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.