[英]Removing duplicates with unique index
I inserted between two tables fields A,B,C,D, believing I had created a Unique Index on A,B,C,D to prevent duplicates.我在两个表字段 A、B、C、D 之间插入,相信我在 A、B、C、D 上创建了一个唯一索引以防止重复。 However I somehow simply made a normal index on those.
然而,我以某种方式简单地对这些做了一个正常的索引。 So duplicates got inserted.
因此插入了重复项。 It is 20 million record table.
它是2000万条记录表。
If I change my existing index from normal to unique or simply a add a new unique index for A,B,C,D will the duplicates be removed or will adding fail since unique records exist?如果我将现有索引从正常更改为唯一索引,或者只是为 A、B、C、D 添加一个新的唯一索引,重复项会被删除还是因为存在唯一记录而添加失败? I'd test it yet it is 30 mil records and I neither wish to mess the table up or duplicate it.
我会测试它,但它有 3000 万条记录,我不想弄乱表格或复制它。
If you have duplicates in your table and you use如果您的表中有重复项并且您使用
ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
the query will fail with Error 1062 (duplicate key).查询将因错误 1062(重复键)而失败。
But if you use IGNORE
但是如果你使用
IGNORE
-- (only works before MySQL 5.7.4)
ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
the duplicates will be removed.重复项将被删除。 But the documentation doesn't specify which row will be kept:
但文档没有指定将保留哪一行:
IGNORE
is a MySQL extension to standard SQL.IGNORE
是标准 SQL 的 MySQL 扩展。 It controls howALTER TABLE
works if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled.如果新表中的唯一键存在重复项,或者在启用严格模式时出现警告,它会控制
ALTER TABLE
工作方式。 IfIGNORE
is not specified, the copy is aborted and rolled back if duplicate-key errors occur.如果未指定
IGNORE
,则在发生重复键错误时中止并回滚副本。 IfIGNORE
is specified, only one row is used of rows with duplicates on a unique key.如果指定了
IGNORE
则在唯一键上具有重复项的行中仅使用一行。 The other conflicting rows are deleted.其他冲突行被删除。 Incorrect values are truncated to the closest matching acceptable value.
不正确的值被截断为最接近匹配的可接受值。
As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.从 MySQL 5.7.4 开始,ALTER TABLE 的 IGNORE 子句被删除,它的使用会产生错误。
( ALTER TABLE Syntax ) ( ALTER TABLE 语法)
If your version is 5.7.4 or greater - you can:如果您的版本是 5.7.4 或更高版本 - 您可以:
INSERT IGNORE
(which is still available).INSERT IGNORE
(仍然可用)将数据复制回来。CREATE TABLE tmp_data SELECT * FROM mytable;
TRUNCATE TABLE mytable;
ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
INSERT IGNORE INTO mytable SELECT * from tmp_data;
DROP TABLE tmp_data;
If you use the
IGNORE
modifier, errors that occur while executing theINSERT
statement are ignored.如果使用
IGNORE
修饰符,则执行INSERT
语句时发生的错误将被忽略。 For example, withoutIGNORE
, a row that duplicates an existingUNIQUE
index orPRIMARY KEY
value in the table causes a duplicate-key error and the statement is aborted.例如,如果没有
IGNORE
,复制表中现有UNIQUE
索引或PRIMARY KEY
值的行会导致重复键错误并且语句被中止。 WithIGNORE
, the row is discarded and no error occurs.使用
IGNORE
,该行被丢弃并且不会发生错误。 Ignored errors generate warnings instead.忽略的错误会生成警告。
Also see: INSERT ... SELECT Syntax and Comparison of the IGNORE Keyword and Strict SQL Mode另请参阅: INSERT ... SELECT 语法和IGNORE 关键字和 Strict SQL 模式的比较
if you think there will be duplicates, adding the unique index will fail.如果您认为会有重复,则添加唯一索引将失败。 first check what duplicates there are:
首先检查有哪些重复项:
select * from
(select a,b,c,d,count(*) as n from table_name group by a,b,c,d) x
where x.n > 1
This may be a expensive query on 20M rows, but will get you all duplicate keys that will prevent you from adding the primary index.这可能是对 20M 行的昂贵查询,但会为您提供所有重复的键,这将阻止您添加主索引。 You could split this up into smaller chunks if you do a where in the subquery:
where a='some_value'
如果在子查询中执行 where,则可以将其拆分为更小的块:
where a='some_value'
For the records retrieved, you will have to change something to make the rows unique.对于检索到的记录,您必须进行一些更改以使行唯一。 If that is done (query returns 0 rows) you should be safe to add the primary index.
如果这样做(查询返回 0 行),您应该可以安全地添加主索引。
您可以使用 ON DUPLICATE KEY UPDATE 代替 IGNORE,这将使您能够控制哪些值应该占上风。
To answer your question- adding a UNIQUE
constraint on a column that has duplicate values will throw an error.要回答您的问题 - 在具有重复值的列上添加
UNIQUE
约束将引发错误。
For example, you can try the following script:例如,您可以尝试以下脚本:
CREATE TABLE `USER` (
`USER_ID` INT NOT NULL,
`USERNAME` VARCHAR(45) NOT NULL,
`NAME` VARCHAR(45) NULL,
PRIMARY KEY (`USER_ID`));
INSERT INTO USER VALUES(1,'apple', 'woz'),(2,'apple', 'jobs'),
(3,'google', 'sergey'),(4,'google', 'larry');
ALTER TABLE `USER`
ADD UNIQUE INDEX `USERNAME_UNIQUE` (`USERNAME` ASC);
/*
Operation failed: There was an error while applying the SQL script to the database.
ERROR 1062: Duplicate entry 'apple' for key 'USERNAME_UNIQUE'
*/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.