[英]How to find a possible duplicates in a row using an SQL query?
I want a query to find the possible duplicates in a row.我想要一个查询来查找一行中可能的重复项。 Eg : Table A例如:表A
Name
------------
1.Rajaraju.
2.Rajuraja.
3.Vijay.
4.Ramkumar.
5.Kumarram.
6.Sakthi.
7.Raj ram Ravi.
8.Ravi Raj ram.
Want a query to pick all the names which are similar ie Rajaraju and Rajuraja same, similarly Raj ram Ravi and Ravi Raj ram.....想要查询以选择所有相似的名称,即 Rajaraju 和 Rajuraja 相同,同样是 Raj ram Ravi 和 Ravi Raj ram.....
UTL_MATCH.JARO_WINKLER_SIMILARITY
might be one choice. UTL_MATCH.JARO_WINKLER_SIMILARITY
可能是一种选择。 Higher value means a better match.较高的值意味着更好的匹配。
For example, I sorted the result on similarity descending and showed only several rows.例如,我按照相似度降序对结果进行排序,只显示了几行。 You should decide which similarity value satisfies your needs and apply another condition, eg where sim >= 80
.您应该决定哪个相似度值满足您的需求并应用另一个条件,例如where sim >= 80
。
SQL> with test (name) as
2 (select '1.Rajaraju.' from dual union all
3 select '2.Rajuraja.' from dual union all
4 select '3.Vijay.' from dual union all
5 select '4.Ramkumar.' from dual union all
6 select '5.Kumarram.' from dual union all
7 select '6.Sakthi.' from dual union all
8 select '7.Raj ram Ravi.' from dual union all
9 select '8.Ravi Raj ram.' from dual
10 ),
11 -- remove leading numbers and dots
12 inter as
13 (select translate(t.name, 'x.0123456789', 'x') name
14 from test t
15 )
16 -- find similarity
17 select a.name,
18 b.name,
19 utl_match.jaro_winkler_similarity(a.name, b.name) sim
20 from inter a cross join inter b
21 where a.name < b.name
22 order by 3 desc;
NAME NAME SIM
-------------------- -------------------- ----------
Rajaraju Rajuraja 87
Raj ram Ravi Rajuraja 82
Raj ram Ravi Ravi Raj ram 80
Raj ram Ravi Rajaraju 78
Rajuraja Ramkumar 74
Rajaraju Ramkumar 74
Kumarram Ramkumar 72
Rajaraju Ravi Raj ram 71
Rajuraja Ravi Raj ram 71
<snip>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.