简体   繁体   English

如何使用 SQL 查询在一行中查找可能的重复项?

[英]How to find a possible duplicates in a row using an SQL query?

I want a query to find the possible duplicates in a row.我想要一个查询来查找一行中可能的重复项。 Eg : Table A例如:表A

Name
------------
1.Rajaraju.    
2.Rajuraja.     
3.Vijay. 
4.Ramkumar. 
5.Kumarram.  
6.Sakthi. 
7.Raj ram Ravi. 
8.Ravi Raj ram. 

Want a query to pick all the names which are similar ie Rajaraju and Rajuraja same, similarly Raj ram Ravi and Ravi Raj ram.....想要查询以选择所有相似的名称,即 Rajaraju 和 Rajuraja 相同,同样是 Raj ram Ravi 和 Ravi Raj ram.....

UTL_MATCH.JARO_WINKLER_SIMILARITY might be one choice. UTL_MATCH.JARO_WINKLER_SIMILARITY可能是一种选择。 Higher value means a better match.较高的值意味着更好的匹配。

For example, I sorted the result on similarity descending and showed only several rows.例如,我按照相似度降序对结果进行排序,只显示了几行。 You should decide which similarity value satisfies your needs and apply another condition, eg where sim >= 80 .您应该决定哪个相似度值满足您的需求并应用另一个条件,例如where sim >= 80

SQL> with test (name) as
  2    (select '1.Rajaraju.' from dual union all
  3     select '2.Rajuraja.' from dual union all
  4     select '3.Vijay.' from dual union all
  5     select '4.Ramkumar.' from dual union all
  6     select '5.Kumarram.' from dual union all
  7     select '6.Sakthi.' from dual union all
  8     select '7.Raj ram Ravi.' from dual union all
  9     select '8.Ravi Raj ram.' from dual
 10    ),
 11  -- remove leading numbers and dots
 12  inter as
 13    (select translate(t.name, 'x.0123456789', 'x') name
 14     from test t
 15    )
 16  -- find similarity
 17  select a.name,
 18         b.name,
 19        utl_match.jaro_winkler_similarity(a.name, b.name) sim
 20  from inter a cross join inter b
 21  where a.name < b.name
 22  order by 3 desc;

NAME                 NAME                        SIM
-------------------- -------------------- ----------
Rajaraju             Rajuraja                     87
Raj ram Ravi         Rajuraja                     82
Raj ram Ravi         Ravi Raj ram                 80
Raj ram Ravi         Rajaraju                     78
Rajuraja             Ramkumar                     74
Rajaraju             Ramkumar                     74
Kumarram             Ramkumar                     72
Rajaraju             Ravi Raj ram                 71
Rajuraja             Ravi Raj ram                 71
<snip>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM