简体   繁体   English

在 SQL 中,在另一列中查找具有唯一值的列中的重复项

[英]In SQL, find duplicates in one column with unique values for another column

So I have a table of aliases linked to record ids.所以我有一个链接到记录 ID 的别名表。 I need to find duplicate aliases with unique record ids.我需要找到具有唯一记录 ID 的重复别名。 To explain better:为了更好地解释:

ID    Alias     Record ID
1     000123    4
2     000123    4
3     000234    4
4     000123    6
5     000345    6
6     000345    7

The result of a query on this table should be something to the effect of对该表的查询结果应该是

000123    4    6
000345    6    7

Indicating that both record 4 and 6 have an alias of 000123 and both record 6 and 7 have an alias of 000345.表示记录 4 和 6 的别名均为 000123,记录 6 和 7 的别名均为 000345。

I was looking into using GROUP BY but if I group by alias then I can't select record id and if I group by both alias and record id it will only return the first two rows in this example where both columns are duplicates.我正在研究使用 GROUP BY 但如果我按别名分组,那么我无法选择记录 ID,如果我同时按别名和记录 ID 分组,它只会返回本示例中的前两行,其中两列都是重复的。 The only solution I've found, and it's a terrible one that crashed my server, is to do two different selects for all the data and then join them我找到的唯一解决方案,这是一个让我的服务器崩溃的可怕解决方案,是对所有数据进行两次不同的选择,然后加入它们

ON [T_1].[ALIAS] = [T_2].[ALIAS] AND NOT [T_1].[RECORD_ID] = [T_2].[RECORD_ID]

Are there any solutions out there that would work better?有没有更好的解决方案? As in, not crash my server when run on a few hundred thousand records?例如,在几十万条记录上运行时不会使我的服务器崩溃?

It looks as if you have two requirements:看起来你有两个要求:

  1. Identify all aliases that have more than one record id, and识别具有多个记录 id 的所有别名,以及
  2. List the record ids for these aliases horizontally.水平列出这些别名的记录 ID。

The first is a lot easier to do than the second.第一个比第二个容易得多。 Here's some SQL that ought to get you where you want with the first:这里有一些 SQL 应该可以让你第一次到达你想要的地方:

WITH A   -- Get a list of unique combinations of Alias and [Record ID]
AS  (
   SELECT Distinct
          Alias
     ,    [Record ID]
   FROM  T1
)
,   B  -- Get a list of all those Alias values that have more than one [Record ID] associated
AS  (
    SELECT Alias
    FROM   A
    GROUP BY
           Alias
    HAVING COUNT(*) > 1
)
SELECT  A.Alias
    ,   A.[Record ID]
FROM    A
    JOIN B
        ON  A.Alias = B.Alias

Now, as for the second.现在,至于第二个。 If you're satisfied with the data in this form:如果您对此表格中的数据感到满意:

Alias     Record ID
000123    4
000123    6
000345    6
000345    7

... you can stop there. ...你可以停在那里。 Otherwise, things get tricky.否则,事情会变得棘手。

The PIVOT command will not necessarily help you, because it's trying to solve a different problem than the one you have. PIVOT 命令不一定会帮助您,因为它试图解决与您的问题不同的问题。

I am assuming that you can't necessarily predict how many duplicate Record ID values you have per Alias , and thus don't know how many columns you'll need.我假设您不一定能预测每个Alias有多少重复的Record ID值,因此不知道您需要多少列。

If you have only two, then displaying each of them in a column becomes a relatively trivial exercise.如果您只有两个,那么将它们中的每一个显示在一个列中将成为一个相对微不足道的练习。 If you have more, I'd urge you to consider whether the destination for these records (a report? A web page? Excel?) might be able to do a better job of displaying them horizontally than SQL Server can do in returning them arranged horizontally.如果你有更多,我会敦促你考虑这些记录的目的地(报告?网页?Excel?)在水平显示它们方面是否比 SQL Server 在返回它们方面做得更好水平。

Perhaps what you want is just the min() and max() of RecordId :也许您想要的只是RecordIdmin()max()

select Alias, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having min(RecordId) <> max(RecordId)

You can also count the number of distinct values, using count(distinct) :您还可以使用count(distinct)不同值的数量:

select Alias, count(distinct RecordId) as NumRecordIds, min(RecordID), max(RecordId)
from yourTable t
group by Alias
having count(DISTINCT RecordID) > 1;

This will give all repeated values:这将给出所有重复的值:

select Alias, count(RecordId) as NumRecordIds,  
from yourTable t
group by Alias
having count(RecordId) <> count(distinct RecordId);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM