简体   繁体   中英

How to delete entries that share similar pattern in MySQL

I have a column that may contain entries like this: abc.yahoo.com efg.yshoo.com hij.yahoo.com

I need to delete all the duplicates and LEAVE ONE ONLY as I don't need the others. Such command can be easily done if I know the second part (ex: yahoo.com) but my problem is that the part (yahoo.com) is not fixed. I may have entries such as: abc.msn.com efg.msn.com hij.msn.com

And I want to treat all these cases at once. Is this possible?

This is assuming that you just want to take out the letters before the first . then group on the column:

DELETE a FROM tbl a
LEFT JOIN
(
    SELECT   MIN(id) AS id
    FROM     tbl
    GROUP BY SUBSTRING(column, LOCATE('.', column))
) b ON a.id = b.id
WHERE b.id IS NULL

Where id is your primary key column name, and column is the column that contains the values to group on.

This will also account for domains like xxx.co.uk where you have two parts at the end.

Make sure you have a backup of your current data or run this operation within a transaction (where you can ROLLBACK; if it didn't fit your needs).

EDIT : If after deleting the duplicates you want to replace the letters before the first . with * , you can simply use:

UPDATE tbl
SET column = CONCAT('*', SUBSTRING(column, LOCATE('.', column)))

To delete the duplicates you can use

DELETE FROM your_table t1
LEFT JOIN
(
    SELECT   MIN(id) AS id
    FROM     your_table 
    GROUP BY SUBSTRING_INDEX(REVERSE(col), '.', 2)
) t2 ON t2.id = t1.id
WHERE b.id IS NULL

If you need to create an UNIQUE constraint for that you can do the following

1.Add another field to hold the domain value

ALTER TABLE your_table ADD COLUMN `domain` VARCHAR(100) NOT NULL DEFAULT '';

2.Update it with the correct values

UPDATE your_table set domain = REVERSE(SUBSTRING_INDEX(REVERSE(col), '.', 2));

3.Add the unique constraint

ALTER IGNORE TABLE your_table ADD UNIQUE domain (domain);

4.Add before insert and before update trggers to set the domain column

DELIMITER $$

CREATE TRIGGER `your_trigger` BEFORE INSERT ON `your_table ` FOR EACH ROW 
BEGIN
    set new.domain = REVERSE(SUBSTRING_INDEX(REVERSE(new.col1), '.', 2));
END$$


CREATE TRIGGER `your_trigger` BEFORE UPDATE ON `your_table ` FOR EACH ROW 
BEGIN
    set new.domain = REVERSE(SUBSTRING_INDEX(REVERSE(new.col1), '.', 2));
END$$

DELIMITER ;

Note: this assumes the domain is the last 2 words when separated by '.', it will not work for a domain such as ebay.co.uk . For that you will probably need to make a stored function which returns the domain for a given host and use it instead of REVERSE(SUBSTRING_INDEX... .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM