[英]SQL: How to merge case-insensitive duplicates
What would be the best way to remove duplicates while merging their records into one? 在将记录合并为一个时,删除重复项的最佳方法是什么?
I have a situation where the table keeps track of player names and their records like this: 我有一种情况,表跟踪播放器名称和他们的记录,如下所示:
stats
-------------------------------
nick totalgames wins ...
John 100 40
john 200 97
Whistle 50 47
wHiStLe 75 72
...
I would need to merge the rows where nick is duplicated (when ignoring case) and merge the records into one, like this: 我需要合并缺口重复的行(当忽略大小写时)并将记录合并为一个,如下所示:
stats
-------------------------------
nick totalgames wins ...
john 300 137
whistle 125 119
...
I'm doing this in Postgres. 我在Postgres做这个。 What would be the best way to do this?
最好的方法是什么?
I know that I can get the names where duplicates exist by doing this: 我知道通过这样做,我可以获得存在重复项的名称:
select lower(nick) as nick, totalgames, count(*)
from stats
group by lower(nick), totalgames
having count(*) > 1;
I thought of something like this: 我想到了这样的事情:
update stats
set totalgames = totalgames + s.totalgames
from (that query up there) s
where lower(nick) = s.nick
Except this doesn't work properly. 除此之外不能正常工作。 And I still can't seem to be able to delete the other duplicate rows containing the duplicate names.
我似乎仍然无法删除包含重复名称的其他重复行。 What can I do?
我能做什么? Any suggestions?
有什么建议?
Here is your update: 这是你的更新:
UPDATE stats
SET totalgames = x.games, wins = x.wins
FROM (SELECT LOWER(nick) AS nick, SUM(totalgames) AS games, SUM(wins) AS wins
FROM stats
GROUP BY LOWER(nick) ) AS x
WHERE LOWER(stats.nick) = x.nick;
Here is the delete to blow away the duplicate rows: 这是删除重复行的删除:
DELETE FROM stats USING stats s2
WHERE lower(stats.nick) = lower(s2.nick) AND stats.nick < s2.nick;
(Note that the 'update...from' and 'delete...using' syntax are Postgres-specific, and were stolen shamelessly from this answer and this answer .) (请注意,'update ... from'和'delete ... using'语法是Postgres特有的,并且从这个答案和这个答案中无耻地被盗。)
You'll probably also want to run this to downcase all the names: 您可能还希望运行此命令以包含所有名称:
UPDATE STATS SET nick = lower(nick);
Aaaand throw in a unique index on the lowercase version of 'nick' (or add a constraint to that column to disallow non-lowercase values): Aaaand在'nick'的小写版本上抛出一个唯一索引(或者向该列添加一个约束以禁止非小写值):
CREATE UNIQUE INDEX ON stats (LOWER(nick));
It can all be done in one statement, using RETURNING
. 它可以使用
RETURNING
在一个语句中完成。
-- The data
CREATE TABLE stats
( nick VARCHAR PRIMARY KEY
, totalgames INTEGER NOT NULL DEFAULT 0
, wins INTEGER NOT NULL DEFAULT 0
);
INSERT INTO stats(nick, totalgames,wins) VALUES
( 'John', 100, 40) ,( 'john', 200, 97)
,( 'Whistle', 50, 47) ,( 'wHiStLe', 75, 72)
, ( 'Single', 42, 13 ) -- this person has only one record
;
SELECT * FROM stats;
-- The query:
WITH upd AS (
UPDATE stats dst
SET totalgames = src.totalgames
, wins = src.wins
FROM ( SELECT MIN(nick) AS nick -- pick the "lowest" nick as the canonical nick
, SUM(totalgames) AS totalgames
, SUM(wins) AS wins
FROM stats
GROUP BY lower(nick)
) src
WHERE dst.nick = src.nick
RETURNING dst.nick -- only the records that have been updated
)
-- Delete the records that were NOT updated.
DELETE FROM stats del
WHERE NOT EXISTS (
SELECT * FROM upd
WHERE upd.nick = del.nick
)
;
SELECT * FROM stats;
Output: 输出:
INSERT 0 5
nick | totalgames | wins
---------+------------+------
John | 100 | 40
john | 200 | 97
Whistle | 50 | 47
wHiStLe | 75 | 72
Single | 42 | 13
(5 rows)
DELETE 2
nick | totalgames | wins
---------+------------+------
wHiStLe | 125 | 119
john | 300 | 137
Single | 42 | 13
(3 rows)
I think easiest way to do it in one query would be using common table expressions : 我认为在一个查询中最简单的方法是使用公共表表达式 :
with cte as (
delete from stats
where lower(nick) in (
select lower(nick) from stats group by lower(nick) having count(*) > 1
)
returning *
)
insert into stats(nick, totalgames, wins)
select lower(nick), sum(totalgames), sum(wins)
from cte
group by lower(nick);
As you see, inside the cte I'm deleting duplicates and returning deleted rows, after that inserting grouped deleted data back into table. 如您所见,在cte中我删除重复项并返回已删除的行,然后将已分组的已删除数据插回到表中。
see sql fiddle demo 看看sql小提琴演示
UPDATE stats SET totalgames=s.totalgames, wins=s.wins
FROM (SELECT lower(nick) AS nick,SUM(totalgames) AS totalgames,SUM(wins) AS wins FROM stats GROUP BY lower(nick))s WHERE lower(nick)=s.nick;
DELETE FROM stats WHERE
lower(nick) IN (SELECT lower(nick) FROM stats GROUP BY lower(nick) HAVING COUNT(*)>1)
AND NOT lower(nick) IN (SELECT first(nick) FROM stats GROUP BY lower(nick)
应该有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.