简体   繁体   中英

Merging sql rows with same values

I've added some data to my database and I just found out that I've got a lot of duplicates, with different key of course, and I want to merge them into a single record.

I'd like to do it within the sql database itself, I don't want to truncate the table and insert the values again, without duplicates, because the script is quite slow.

Here's a sample of my scenario:

Table track:

key |   artist  | title
----|-----------|--------
k1  |  artist1  | title1
----|-----------|--------
k2  |  artist1  | title1
----|-----------|--------
k3  |  artist1  | title1

Table chart:

trackKey | otherKey |  anotherKey  |  value
---------|----------|--------------|---------
k1       |   ok1    |      ak4     |    v1
---------|----------|--------------|---------
k3       |   ok2    |      ak2     |    v2
---------|----------|--------------|---------
k1       |   ok3    |      ak9     |    v2
---------|----------|--------------|---------
k2       |   ok4    |      ak1     |    v6

where chart.trackKey references track.key

The result that I'd like to achieve is:

Table track:

key |   artist  | title
----|-----------|--------
k1  |  artist1  | title1

Table chart:

trackKey | otherKey |  anotherKey  |  value
---------|----------|--------------|---------
k1       |   ok1    |      ak4     |    v1
---------|----------|--------------|---------
k1       |   ok2    |      ak2     |    v2
---------|----------|--------------|---------
k1       |   ok3    |      ak9     |    v2
---------|----------|--------------|---------
k1       |   ok4    |      ak1     |    v6

so that each duplicate of the same entry in track is merged into one row and the old keys in chart are updated with the only one that remained in the track table.

Is there any way to do this in SQL ?

EDIT:

Solution #1 based on @popovitsj's answer

UPDATE chart c SET trackUri =
(WITH track_unique AS
(
    SELECT MIN(uri) AS key, artist, title, album. artwork FROM track
    GROUP BY artist, title
)
SELECT tu.key FROM chart c1
INNER JOIN track t ON c1.trackUri = t.key
INNER JOIN track_unique tu ON t.artist = tu.artist AND t.title = tu.title
WHERE c1.trackUri = c.trackUri and c1.countryId = c.countryId and c1.date = c.date);

returns

#1064 - Syntax error near 
'track_unique AS (
SELECT MIN(uri) AS key, artist, title, album. artwork FR' line 2 

Solution #2 based on @juergen d's answer

update chart
join track t1 on t1.uri = chart.trackUri
left join 
(
   select min(uri) as key
   from track 
   group by artist, title
) tmp_track on tmp_track.key = chart.trackUri
set trackkey = tmp_tbl.key
where chart.trackUri not in 
(
  select min(uri)
  from track
  group by artist, title
  having count(*) > 1
);

returns

#1064 - Syntax error near
   'key
   from track
   group by artist, title
) tmp_track on tmp_track.key = c' line 5 

I don't know what I'm doing wrong so I'm adding the schema definitions (taken from phpMyAdmin )

在此处输入图片说明

The first with clause gets the id's you want to keep, then in the next select query you match those id's to the the chart id.

I edited this answer based on your modification of my original answer. This answer assumes that chart(countryid,date) uniquely identies a chart, and that tracks may be merged only if track(key,artist,title,album) is equal.

UPDATE chart c SET trackUri =
(WITH track_unique AS
(
    SELECT MIN(uri) AS key, artist, title, album, artwork FROM track
    GROUP BY artist, title, album, artwork
)
SELECT tu.key FROM chart c1
INNER JOIN track t ON c1.trackUri = t.key
INNER JOIN track_unique tu
ON t.artist = tu.artist
AND t.title = tu.title
AND t.album = tu.album
AND t.artwork = tu.artwork
WHERE c1.trackUri = c.trackUri
AND c1.countryId = c.countryId
AND c1.date = c.date);

To delete the leftover duplicates after doing this update:

DELETE FROM track
WHERE uri NOT IN
    (SELECT MIN(uri) AS key, artist, title, album, artwork
     FROM track
     GROUP BY artist, title, album, artwork);

If the duplicate values are exact duplicates, you could use

SELECT MIN(key),artist,title FROM track GROUP BY artist,title;

to get a duplicate-free version of the data in the track table. You could put this in a temporary table and swap them over, or use your SQL client to download the data and re-import it, or whatever-- for safety's sake I wouldn't try to do it all in a single statement...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM