简体   繁体   English

SQL:如何合并不区分大小写的重复项

[英]SQL: How to merge case-insensitive duplicates

What would be the best way to remove duplicates while merging their records into one? 在将记录合并为一个时,删除重复项的最佳方法是什么?

I have a situation where the table keeps track of player names and their records like this: 我有一种情况,表跟踪播放器名称和他们的记录,如下所示:

stats
-------------------------------
nick     totalgames     wins   ...
John     100            40
john     200            97
Whistle  50             47
wHiStLe  75             72
...

I would need to merge the rows where nick is duplicated (when ignoring case) and merge the records into one, like this: 我需要合并缺口重复的行(当忽略大小写时)并将记录合并为一个,如下所示:

    stats
    -------------------------------
    nick     totalgames     wins   ...
    john     300            137
    whistle  125            119
    ...

I'm doing this in Postgres. 我在Postgres做这个。 What would be the best way to do this? 最好的方法是什么?

I know that I can get the names where duplicates exist by doing this: 我知道通过这样做,我可以获得存在重复项的名称:

select lower(nick) as nick, totalgames, count(*) 
from stats 
group by lower(nick), totalgames
having count(*) > 1;

I thought of something like this: 我想到了这样的事情:

update stats
set totalgames = totalgames + s.totalgames
from (that query up there) s
where lower(nick) = s.nick

Except this doesn't work properly. 除此之外不能正常工作。 And I still can't seem to be able to delete the other duplicate rows containing the duplicate names. 我似乎仍然无法删除包含重复名称的其他重复行。 What can I do? 我能做什么? Any suggestions? 有什么建议?

SQL Fiddle SQL小提琴

Here is your update: 这是你的更新:

 UPDATE stats
 SET totalgames = x.games, wins = x.wins
 FROM (SELECT LOWER(nick) AS nick, SUM(totalgames) AS games, SUM(wins) AS wins
     FROM stats
      GROUP BY LOWER(nick) ) AS x
 WHERE LOWER(stats.nick) = x.nick;

Here is the delete to blow away the duplicate rows: 这是删除重复行的删除:

 DELETE FROM stats USING stats s2
 WHERE lower(stats.nick) = lower(s2.nick) AND stats.nick < s2.nick;

(Note that the 'update...from' and 'delete...using' syntax are Postgres-specific, and were stolen shamelessly from this answer and this answer .) (请注意,'update ... from'和'delete ... using'语法是Postgres特有的,并且从这个答案这个答案中无耻地被盗。)

You'll probably also want to run this to downcase all the names: 您可能还希望运行此命令以包含所有名称:

 UPDATE STATS SET nick = lower(nick);

Aaaand throw in a unique index on the lowercase version of 'nick' (or add a constraint to that column to disallow non-lowercase values): Aaaand在'nick'的小写版本上抛出一个唯一索引(或者向该列添加一个约束以禁止非小写值):

CREATE UNIQUE INDEX ON stats (LOWER(nick)); 

It can all be done in one statement, using RETURNING . 它可以使用RETURNING在一个语句中完成。

-- The data
CREATE TABLE stats
        ( nick VARCHAR PRIMARY KEY
        , totalgames INTEGER NOT NULL DEFAULT 0
        , wins INTEGER NOT NULL DEFAULT 0
        );

INSERT INTO stats(nick, totalgames,wins) VALUES
 ( 'John', 100, 40) ,( 'john', 200, 97)
,( 'Whistle', 50, 47) ,( 'wHiStLe', 75, 72)
, ( 'Single', 42, 13 ) -- this person has only one record
        ;
SELECT * FROM stats;

-- The query:
WITH upd AS (
        UPDATE stats dst
        SET totalgames = src.totalgames
                , wins = src.wins
        FROM ( SELECT MIN(nick) AS nick -- pick the "lowest" nick as the canonical nick
                , SUM(totalgames) AS totalgames
                , SUM(wins) AS wins
                FROM stats
                GROUP BY lower(nick)
                ) src
        WHERE dst.nick = src.nick
        RETURNING dst.nick -- only the records that have been updated
        )
-- Delete the records that were NOT updated.
DELETE FROM stats del
WHERE NOT EXISTS (
        SELECT * FROM upd
        WHERE upd.nick = del.nick
        )
        ;

SELECT * FROM stats;

Output: 输出:

INSERT 0 5
  nick   | totalgames | wins 
---------+------------+------
 John    |        100 |   40
 john    |        200 |   97
 Whistle |         50 |   47
 wHiStLe |         75 |   72
 Single  |         42 |   13
(5 rows)

DELETE 2
  nick   | totalgames | wins 
---------+------------+------
 wHiStLe |        125 |  119
 john    |        300 |  137
 Single  |         42 |   13
(3 rows)

I think easiest way to do it in one query would be using common table expressions : 我认为在一个查询中最简单的方法是使用公共表表达式

with cte as (
    delete from stats
    where lower(nick) in (
      select lower(nick) from stats group by lower(nick) having count(*) > 1
    )
    returning *
)
insert into stats(nick, totalgames, wins)
select lower(nick), sum(totalgames), sum(wins)
from cte
group by lower(nick);

As you see, inside the cte I'm deleting duplicates and returning deleted rows, after that inserting grouped deleted data back into table. 如您所见,在cte中我删除重复项并返回已删除的行,然后将已分组的已删除数据插回到表中。

see sql fiddle demo 看看sql小提琴演示

UPDATE stats SET totalgames=s.totalgames, wins=s.wins
FROM (SELECT lower(nick) AS nick,SUM(totalgames) AS totalgames,SUM(wins) AS wins FROM stats GROUP BY lower(nick))s WHERE lower(nick)=s.nick;
DELETE FROM stats WHERE
lower(nick) IN (SELECT lower(nick) FROM stats GROUP BY lower(nick) HAVING COUNT(*)>1)
AND NOT lower(nick) IN (SELECT first(nick) FROM stats GROUP BY lower(nick)应该有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM