简体   繁体   English

SQL - 仅选择不重复的行

[英]SQL - Only select row that is not duplicated

I need to transfer data from one table to another. 我需要将数据从一个表传输到另一个表。 The second table got a primary key constraint (and the first one have no constraint). 第二个表有一个主键约束(第一个没有约束)。 They have the same structure. 它们具有相同的结构。 What I want is to select all rows from table A and insert it in table B without the duplicate row (if a row is0 duplicate, I only want to take the first one I found) 我想要的是从表A中选择所有行并将其插入表B中,而不是重复行(如果一行是0重复,我只想采取我找到的第一行)

Example : 示例:

MyField1 (PK)   |   MyField2 (PK)   |   MyField3(PK)   |   MyField4   |   MyField5  

----------

1               |   'Test'          |   'A1'           |   'Data1'    |   'Data1'  
2               |   'Test1'         |   'A2'           |   'Data2'    |   'Data2'  
2               |   'Test1'         |   'A2'           |   'Data3'    |   'Data3'  
4               |   'Test2'         |   'A3'           |   'Data4'    |   'Data4'

Like you can see, the second and third line got the same pk key, but different data in MyField4 and MyField5. 就像你看到的那样,第二行和第三行有相同的pk键,但MyField4和MyField5中的数据不同。 So, in this example, I would like to have the first, second, and fourth row. 所以,在这个例子中,我希望有第一,第二和第四行。 Not the third one because it's a duplication of the second (even if MyField4 and MyField5 contain different data). 不是第三个,因为它是第二个的重复(即使MyField4和MyField5包含不同的数据)。

How can I do that with one single select ? 我怎么能用一个选择呢?

thx 谢谢

First, you need to define what makes a row "first". 首先,您需要定义什么使行“第一”。 I'll make up an arbitrary definition and you can change the SQL as you need to for what you want. 我将构成一个任意定义,您可以根据需要更改SQL。 For this example, I assume "first" to be the lowest value for MyField4 and if they are equal then the lowest value for MyField5. 对于这个例子,我假设“first”是MyField4的最低值,如果它们等于MyField5的最低值。 It also accounts for the possibility of all 5 columns being identical. 它还说明了所有5列相同的可能性。

SELECT DISTINCT
     T1.MyField1,
     T1.MyField2,
     T1.MyField3,
     T1.MyField4,
     T1.MyField5
FROM
     MyTable T1
LEFT OUTER JOIN MyTable T2 ON
     T2.MyField1 = T1.MyField1 AND
     T2.MyField2 = T1.MyField2 AND
     T2.MyField3 = T1.MyField3 AND
     (
          T2.MyField4 > T1.MyField4 OR
          (
               T2.MyField4 = T1.MyField4 AND
               T2.MyField5 > T1.MyField5
          )
     )
WHERE
     T2.MyField1 IS NULL

If you also want to account for PKs that are not duplicated in the source table, but already exist in your destination table then you'll need to account for that too. 如果您还想要考虑源表中没有重复但在目标表中已存在的PK,那么您也需要考虑到这一点。

Not sure how you know which of row 2 and row 3 you want in the new table, but in mysql you can simply: 不知道你怎么知道你想要在新表中的第2行和第3行,但在mysql中你可以简单地:

insert ignore into new_table (select * from old_table);

And the PK won't allow duplicate entries to be inserted. 并且PK将不允许插入重复条目。

What is your database? 你的数据库是什么? In Oracle you could say 在Oracle中你可以说

SELECT FROM your_table
WHERE rowid in
(SELECT MIN(rowid)
 FROM your_table
 GROUP BY MyField1, MyField2, MyField3);

Note that it is somewhat uncertain which of the rows with the same PK will be considered "first". 请注意,有些不确定具有相同PK的哪一行将被视为“第一”。 If you need to impose a specific order, you need to additionally sort on the other columns. 如果您需要强制执行特定订单,则需要对其他列进行其他排序。

It depends on what you're looking for. 这取决于你在寻找什么。

There's a big difference between using JOIN + WHERE NULL , NOT IN , and NOT EXISTS , including performance, which is more important with larger data sets. 使用JOIN + WHERE NULLNOT INNOT EXISTS (包括性能)之间存在很大差异,这对于较大的数据集更为重要。

(See NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL .) (参见NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL 。)

The three methods shown in the linked article are pretty straightforward. 链接文章中显示的三种方法非常简单。

CREATE TABLE #A(
ID INTEGER IDENTITY,
[MyField1] [int] NULL,
[MyField2] [varchar](10) NULL,
[MyField3] [varchar](10) NULL,
[MyField4] [varchar](10) NULL,
[MyField5] [varchar](10) NULL
) 

INSERT INTO #A (MyField1,MyField2,MyField3,MyField4,MyField5) SELECT * FROM A

insert into B 
   select MyField1,MyField2,MyField3,MyField4,MyField5 from #A a1 
    where not exists (select id from #A a2 where a2.MyField1 = a1.MyField1 and a2.ID < a1.ID)

DROP TABLE #A

OR 要么

insert into b
  select distinct * from a a1 
    where not exists (
  select a2.MyField1 from a a2 where a1.MyField1 = a2.MyField1 and 
       (a1.MyField2 < a2.MyField2 or a1.MyField3 < a2.MyField3 
        or a1.MyField4 < a2.MyField5 or a1.MyField5 < a2.MyField5))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM