简体   繁体   English

如何在子字符串定义的SQL中识别和删除重复项

[英]How to identify and delete duplicates in SQL defined by a substring

I have a complex issue regarding de-duplication in SQL that I could use some advice on: 关于SQL中的重复数据删除,我遇到了一个复杂的问题,我可以在以下方面使用一些建议:

I have a table with airport codes. 我有一张带机场代码的桌子。 However, there are duplicates in some cases where one row lists the local airport ID, while another lists the ICAO (international) ID , which includes a leading K . 但是,在某些情况下存在重复项,其中一行列出了本地机场ID,而另一行列出了ICAO (international) ID ,其中包括前导K

I need to identify duplicates such as the following: KI80 and I80 KX49 and X49 我需要标识以下重复项: KI80I80 KX49X49

Note that there are many valid rows that start with a K . 请注意,有许多以K开头的有效行。

Step 1: I need to identify the duplicates for the above cases. 步骤1:我需要确定上述情况的重复项。

Step 2: I need to use SQL to automatically delete all duplicates which have the leading K . 步骤2:我需要使用SQL自动删除所有带有前导K重复项。

Step 3: I need to identify in a different table table b , which rows were using identifiers that I just deleted, so I can update them to the surviving ID (example: if they used KI80 , I need to change them to I80 in this new table") 步骤3:我需要在另一个表table b中标识哪些行正在使用我刚刚删除的标识符,因此我可以将其更新为尚存的ID(例如:如果它们使用KI80 ,则需要在此将其更改为I80新表”)

Any help would be greatly appreciated! 任何帮助将不胜感激!

You can use a self join in a delete statement. 您可以在delete语句中使用自我联接。 The idea is to join the table to itself, but doing the match on a "K" prefix. 想法是将表连接到自身,但使用“ K”前缀进行匹配。 If a match exists, then the "K" record is a duplicate: 如果存在匹配项,则“ K”记录是重复的:

delete t
    from table t join
         table tnotk
         on t.airportID = concat('K', tnotk.airportID) and tnotk.airportID not like 'K%'
    where t.airportID like 'K%';

Note: this assumes that no non-ICAO airport ids start with a "K". 注意:这假设没有非ICAO机场ID以“ K”开头。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM