[英]How to identify and delete duplicates in SQL defined by a substring
I have a complex issue regarding de-duplication in SQL
that I could use some advice on: 关于
SQL
中的重复数据删除,我遇到了一个复杂的问题,我可以在以下方面使用一些建议:
I have a table with airport codes. 我有一张带机场代码的桌子。 However, there are duplicates in some cases where one row lists the local airport ID, while another lists the
ICAO (international) ID
, which includes a leading K
. 但是,在某些情况下存在重复项,其中一行列出了本地机场ID,而另一行列出了
ICAO (international) ID
,其中包括前导K
I need to identify duplicates such as the following: KI80
and I80
KX49
and X49
我需要标识以下重复项:
KI80
和I80
KX49
和X49
Note that there are many valid rows that start with a K
. 请注意,有许多以
K
开头的有效行。
Step 1: I need to identify the duplicates for the above cases. 步骤1:我需要确定上述情况的重复项。
Step 2: I need to use SQL to automatically delete all duplicates which have the leading K
. 步骤2:我需要使用SQL自动删除所有带有前导
K
重复项。
Step 3: I need to identify in a different table table b
, which rows were using identifiers that I just deleted, so I can update them to the surviving ID (example: if they used KI80
, I need to change them to I80
in this new table") 步骤3:我需要在另一个表
table b
中标识哪些行正在使用我刚刚删除的标识符,因此我可以将其更新为尚存的ID(例如:如果它们使用KI80
,则需要在此将其更改为I80
新表”)
Any help would be greatly appreciated! 任何帮助将不胜感激!
You can use a self join in a delete statement. 您可以在delete语句中使用自我联接。 The idea is to join the table to itself, but doing the match on a "K" prefix.
想法是将表连接到自身,但使用“ K”前缀进行匹配。 If a match exists, then the "K" record is a duplicate:
如果存在匹配项,则“ K”记录是重复的:
delete t
from table t join
table tnotk
on t.airportID = concat('K', tnotk.airportID) and tnotk.airportID not like 'K%'
where t.airportID like 'K%';
Note: this assumes that no non-ICAO airport ids start with a "K". 注意:这假设没有非ICAO机场ID以“ K”开头。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.