简体   繁体   English

Oracle-SQL语句性能不佳-模糊匹配逻辑

[英]Oracle - SQL Statement Poor Performance - Fuzzy Matching Logic

I have a fuzzy matching requirement .. for ex; 我对模糊匹配有要求。

Table1 - T1Col1, T1col2, T1col3, T1col4, T1col5. 
Table 2 - T2Col1, T2col2, T2col3, T2col4, T2col5. 

so my requirement is 所以我的要求是

  • T1 - Its not necessary all fields ie T1col1, T1col2, T1col3, T1col4, T1col5 are not nulls but there are scenarios where T1Col2 is populated and T1Col3, T1Col4 and 5 is null. T1-并非所有字段(即T1col1,T1col2,T1col3,T1col4,T1col5)都不为空,但存在填充T1Col2且T1Col3,T1Col4和5为空的情况。 Best case scenario here is all fields are not nulls and the worst case is except T1Col1 rest of the fields are nulls. 最好的情况是所有字段都不为空,最坏的情况是T1Col1除外,其余字段均为空。

  • I came up with a fuzzy logic matching so that if atleast one field is matching then 'where' clause should pass through. 我想出了一个模糊逻辑匹配,因此如果至少一个字段匹配,则应通过“ where”子句。

select count(*) from T1, T2
where
  Nvl(T1COl1, nvl(T2Col1, 'x')) = nvl(T2Col1, 'x') and
  Nvl(T1COl2, nvl(T2Col2, 'x') ) = nvl(T2Col2, 'x') and
  Nvl(T1COl3, nvl(T2Col3, 'x'))  = nvl(T2Col3, 'x') and
  Nvl(T1COl4, nvl(T2Col4, 'x')) = nvl(T2Col4 'x') and
  and substr(T1COl5, 1,1) = T2Col5
  ;

Record count in T1 and T2 are 243000 and 55000 records respectively When I run the above statement, it takes 1426.809 seconds and gave me 11349 records. T1和T2中的记录数分别为243000和55000个记录当我运行上述语句时,它花费了1426.809秒,并且给了我11349条记录。 Looks like it is performance poor. 看起来性能很差。 Is that because of usage of substr or usage of too many NVLs in where clause? 这是因为在where子句中使用了substr还是使用了太多的NVL?

Can you help me here how can I improve my query performance or is there a better way of doing that matching? 您可以在这里帮助我如何改善查询性能,或者有更好的匹配方法?

Maybe just a personal preference but I would write this as: 也许只是个人喜好,但我会这样写:

select count(*)
  from t1
      ,t2
 where (t1col1 is null or t1col1 = t2col1)
   and (t1col2 is null or t1col2 = t2col2)
   and (t1col3 is null or t1col3 = t2col3)
   and (t1col4 is null or t1col4 = t2col4)
   and substr(t1col5, 1, 1) = t2col5;
  • Clearer logic that null value in t1 column is ok. 更清楚的逻辑是,t1列中的空值可以。
  • No nvl calculations necessary 无需NVL计算
  • Make use of indexes on columns possible. 充分利用列上的索引。

The logic in this query is slightly different. 此查询中的逻辑略有不同。 This query does not match the value 'x' in t1col1 to an empty value in t2col1. 此查询不匹配t1col1中的值“ x”与t2col1中的空值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM