简体   繁体   English

合并具有相同值的sql行

[英]Merging sql rows with same values

I've added some data to my database and I just found out that I've got a lot of duplicates, with different key of course, and I want to merge them into a single record. 我已经向数据库中添加了一些数据,但我发现我有很多重复项,当然还有不同的键,我想将它们合并为一条记录。

I'd like to do it within the sql database itself, I don't want to truncate the table and insert the values again, without duplicates, because the script is quite slow. 我想在sql数据库本身中执行此操作,我不想截断表并再次插入值(没有重复项),因为脚本非常慢。

Here's a sample of my scenario: 这是我的情况的一个示例:

Table track: 表轨道:

key |   artist  | title
----|-----------|--------
k1  |  artist1  | title1
----|-----------|--------
k2  |  artist1  | title1
----|-----------|--------
k3  |  artist1  | title1

Table chart: 表格图表:

trackKey | otherKey |  anotherKey  |  value
---------|----------|--------------|---------
k1       |   ok1    |      ak4     |    v1
---------|----------|--------------|---------
k3       |   ok2    |      ak2     |    v2
---------|----------|--------------|---------
k1       |   ok3    |      ak9     |    v2
---------|----------|--------------|---------
k2       |   ok4    |      ak1     |    v6

where chart.trackKey references track.key 其中chart.trackKey引用track.key

The result that I'd like to achieve is: 我想要实现的结果是:

Table track: 表轨道:

key |   artist  | title
----|-----------|--------
k1  |  artist1  | title1

Table chart: 表格图表:

trackKey | otherKey |  anotherKey  |  value
---------|----------|--------------|---------
k1       |   ok1    |      ak4     |    v1
---------|----------|--------------|---------
k1       |   ok2    |      ak2     |    v2
---------|----------|--------------|---------
k1       |   ok3    |      ak9     |    v2
---------|----------|--------------|---------
k1       |   ok4    |      ak1     |    v6

so that each duplicate of the same entry in track is merged into one row and the old keys in chart are updated with the only one that remained in the track table. 这样, track相同条目的每个重复项将合并为一行,并且chart中的旧键将更新为track表中唯一的旧键。

Is there any way to do this in SQL ? 有什么办法可以在SQL中做到这一点?

EDIT: 编辑:

Solution #1 based on @popovitsj's answer 基于@popovitsj的答案的解决方案#1

UPDATE chart c SET trackUri =
(WITH track_unique AS
(
    SELECT MIN(uri) AS key, artist, title, album. artwork FROM track
    GROUP BY artist, title
)
SELECT tu.key FROM chart c1
INNER JOIN track t ON c1.trackUri = t.key
INNER JOIN track_unique tu ON t.artist = tu.artist AND t.title = tu.title
WHERE c1.trackUri = c.trackUri and c1.countryId = c.countryId and c1.date = c.date);

returns 回报

#1064 - Syntax error near 
'track_unique AS (
SELECT MIN(uri) AS key, artist, title, album. artwork FR' line 2 

Solution #2 based on @juergen d's answer 基于@juergen d答案的解决方案#2

update chart
join track t1 on t1.uri = chart.trackUri
left join 
(
   select min(uri) as key
   from track 
   group by artist, title
) tmp_track on tmp_track.key = chart.trackUri
set trackkey = tmp_tbl.key
where chart.trackUri not in 
(
  select min(uri)
  from track
  group by artist, title
  having count(*) > 1
);

returns 回报

#1064 - Syntax error near
   'key
   from track
   group by artist, title
) tmp_track on tmp_track.key = c' line 5 

I don't know what I'm doing wrong so I'm adding the schema definitions (taken from phpMyAdmin ) 我不知道自己在做什么错,所以我要添加架构定义(取自phpMyAdmin

在此处输入图片说明

The first with clause gets the id's you want to keep, then in the next select query you match those id's to the the chart id. 第一个with子句获取您要保留的ID,然后在下一个选择查询中,将这些ID与图表ID进行匹配。

I edited this answer based on your modification of my original answer. 我根据您对原始答案的修改对这个答案进行了编辑。 This answer assumes that chart(countryid,date) uniquely identies a chart, and that tracks may be merged only if track(key,artist,title,album) is equal. 该答案假定chart(countryid,date)唯一地标识一个图表,并且仅当track(key,artist,title,album)相等时才可以合并track(key,artist,title,album)

UPDATE chart c SET trackUri =
(WITH track_unique AS
(
    SELECT MIN(uri) AS key, artist, title, album, artwork FROM track
    GROUP BY artist, title, album, artwork
)
SELECT tu.key FROM chart c1
INNER JOIN track t ON c1.trackUri = t.key
INNER JOIN track_unique tu
ON t.artist = tu.artist
AND t.title = tu.title
AND t.album = tu.album
AND t.artwork = tu.artwork
WHERE c1.trackUri = c.trackUri
AND c1.countryId = c.countryId
AND c1.date = c.date);

To delete the leftover duplicates after doing this update: 要在执行此更新后删除剩余的重复项,请执行以下操作:

DELETE FROM track
WHERE uri NOT IN
    (SELECT MIN(uri) AS key, artist, title, album, artwork
     FROM track
     GROUP BY artist, title, album, artwork);

If the duplicate values are exact duplicates, you could use 如果重复的值是完全重复的,则可以使用

SELECT MIN(key),artist,title FROM track GROUP BY artist,title;

to get a duplicate-free version of the data in the track table. 以获得track表中数据的无重复版本。 You could put this in a temporary table and swap them over, or use your SQL client to download the data and re-import it, or whatever-- for safety's sake I wouldn't try to do it all in a single statement... 您可以将其放在一个临时表中并交换它们,或者使用SQL客户端下载数据并重新导入它,等等。为了安全起见,我不会在单个语句中尝试全部操作。 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM