I've found a query that grabs all of the duplicates and groups them by the column name, but I need to display each record on it's own row, grouped by the column name...
What I'm suspicious of is that multiple records with the same design column have been uploaded, and I need to be able to compare each row so I determine which ones are active or not.
The following query seems like it would work, but crashes mysql each time I try and use it:
SELECT *
FROM 2009_product_catalog
WHERE sku IN (
SELECT sku
FROM 2009_product_catalog
GROUP BY sku
HAVING count(sku) > 1
)
ORDER BY sku
I need all records to show, not just records that may be duplicates. The reason is, I need to be able to compare the rest of the columns, so I can know which duplicates need to go.
Your query is logically correct. However, MySQL has some problems with optimizing in
with subquery. Try this version:
SELECT pc.*
FROM 2009_product_catalog pc join
(SELECT sku
FROM 2009_product_catalog
GROUP BY sku
HAVING count(sku) > 1
) pcsum
on pcsum.sku = pc.sku
ORDER BY sku;
If that still doesn't work, then be sure you have an index on 2009_product_catalog(sku, pcid)
(where pcid
is the unique id of each row in the table. Then try this:
select pc.*
FROM 2009_product_catalog pc
where exists (select 1
from 2009_product_catalog pc2
where pc2.sku = pc.sku and pc2.pcid <> pc.pcid
)
I think the IN
or exists
statement is very heavy performance.
Assume that your table has a field named
as your primary key. Remember create an index on your id
sku
field.
SELECT pc.*
FROM
2009_product_catalog pc
INNER JOIN 2009_product_catalog pc2 ON pc.sku = pc2.sku AND pc.id != pc2.id
Edit
SELECT pc.*, pc2.id as `pc2_id`
FROM
2009_product_catalog pc
LEFT OUTER JOIN 2009_product_catalog pc2 ON pc.sku = pc2.sku AND pc.id != pc2.id
This query gives all records to you, every duplicated record has pc2_id is not null. If pc2_id is null, it's not duplicated. Otherwise, if the record has duplicated for more than 2 times, it will appear in your result more than 1 time, is it problem?
SELECT * FROM 2009_product_catalog t1 INNER JOIN
( SELECT sku FROM 2009_product_catalog GROUP BY sku HAVING COUNT(sku) > 1 ) t2
ON t1.sku = t2.sku
This is the alternate to the original query posted in your question. It uses joins instead of subquery, naturally joins are faster.
t1 is the original table. t2 contains only those rows which are duplicate. The result (inner join) will have records with duplicate sku.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.