简体   繁体   English

SQL通配符搜索 - 效率?

[英]SQL Wildcard Search - Efficiency?

There has been a debate at work recently at the most efficient way to search a MS SQL database using LIKE and wildcards. 最近在使用LIKE和通配符搜索MS SQL数据库的最有效方法上进行了辩论。 We are comparing using %abc% , %abc , and abc% . 我们使用%abc%%abcabc% One person has said that you should always have the wildcard at the end of the term ( abc% ). 一个人说你应该在学期结束时( abc% )总是有通配符。 So, according to them, if we wanted to find something that ended in "abc" it'd be most efficient to use `reverse(column) LIKE reverse('%abc'). 因此,根据他们的说法,如果我们想要找到以“abc”结尾的东西,那么使用`reverse(column)LIKE reverse('%abc')是最有效的。

I set up a test using SQL Server 2008 (R2) to compare each of the following statements: 我使用SQL Server 2008(R2)设置了一个测试来比较以下每个语句:

select * from CLMASTER where ADDRESS like '%STREET'
select * from CLMASTER where ADDRESS like '%STREET%'   
select * from CLMASTER where ADDRESS like reverse('TEERTS%')  
select * from CLMASTER where reverse(ADDRESS) like reverse('%STREET')

CLMASTER holds about 500,000 records, there are about 7,400 addresses that end "Street", and about 8,500 addresses that have "Street" in it, but not necessarily at the end. CLMASTER拥有大约500,000条记录,大约有7,400个地址以“Street”结尾,大约8,500个地址包含“Street”,但不一定在最后。 Each test run took 2 seconds and they all returned the same amount of rows except for %STREET% , which found an extra 900 or so results because it picked up addresses that had an apartment number on the end. 每次测试运行花了2秒钟,他们都返回了相同数量的行,除了%STREET% ,它发现了额外的900左右的结果,因为它拾取了最后有公寓号的地址。

Since the SQL Server test didn't show any difference in execution time I moved into PHP where I used the following code, switching in each statement, to run multiple tests quickly: 由于SQL Server测试没有显示执行时间的任何差异,我移动到PHP,我使用以下代码,切换每个语句,快速运行多个测试:

<?php

    require_once("config.php");
    $connection = odbc_connect( $connection_string, $U, $P );

    for ($i = 0; $i < 500; $i++) {
    $m_time = explode(" ",microtime());
    $m_time = $m_time[0] + $m_time[1];

    $starttime = $m_time;

    $Message=odbc_exec($connection,"select * from CLMASTER where ADDRESS like '%STREET%'");
    $Message=odbc_result($Message,1);

    $m_time = explode(" ",microtime());
    $m_time = $m_time[0] + $m_time[1];

    $endtime = $m_time;

    $totaltime[] = ($endtime - $starttime);

}

odbc_close($connection);

echo "<b>Test took and average of:</b> ".round(array_sum($totaltime)/count($totaltime),8)." seconds per run.<br>";
echo "<b>Test took a total of:</b> ".round(array_sum($totaltime),8)." seconds to run.<br>";

?>

The results of this test was about as ambiguous as the results when testing in SQL Server. 此测试的结果与在SQL Server中测试时的结果一样模糊。

%STREET completed in 166.5823 seconds (.3331 average per query), and averaged 500 results found in .0228. %STREET在166.5823秒内完成(每个查询平均值为.3331),在.0228中找到平均500个结果。

%STREET% completed in 149.4500 seconds (.2989 average per query), and averaged 500 results found in .0177. %STREET%在149.4500秒内完成(每个查询平均值为.2989),平均在.0177中找到500个结果。 (Faster time per result because it finds more results than the others, in similar time.) (每个结果的更快时间,因为它在相似的时间内找到比其他结果更多的结果。)

reverse(ADDRESS) like reverse('%STREET') completed in 134.0115 seconds (.2680 average per query), and averaged 500 results found in .0183 seconds. reverse(ADDRESS) like reverse('%STREET')在134.0115秒内完成(每个查询平均为.2680),平均500个结果在.0183秒内找到。

reverse('TREETS%') completed in 167.6960 seconds (.3354 average per query), and averaged 500 results found in .0229. reverse('TREETS%')在167.6960秒内完成(每个查询平均为.3354),平均在.0229中找到500个结果。

We expected this test to show that %STREET% would be the slowest overall, while it was actually the fastest to run, and had the best average time to return 500 results. 我们预计此测试将显示%STREET%将是最慢的整体,而它实际上是最快的运行,并且具有返回500结果的最佳平均时间。 While the suggested reverse('%STREET') was the fastest to run overall, but was a little slower in time to return 500 results. 虽然建议的reverse('%STREET')是整体运行最快的,但是返回500结果的时间稍慢。

Extra fun: A coworker ran profiler on the server while we were running the tests and found that the use of the double wildcard produced a significant increase CPU usage, while the other tests were within 1-2% of each other. 额外的乐趣:当我们运行测试时,同事在服务器上运行分析器,发现使用双通配符会显着增加CPU使用率,而其他测试则相互之间的1-2%。

Are there any SQL Efficiency experts out that that can explain why having the wildcard at the end of the search string would be better practice than the beginning, and perhaps why searching with wildcards at the beginning and end of the string was faster than having the wildcard just at the beginning? 是否有任何SQL效率专家可以解释为什么在搜索字符串末尾使用通配符比开头更好的做法,也许为什么在字符串的开头和结尾使用通配符进行搜索比使用通配符更快刚刚开始?

Having the wildcard at the end of the string, like 'abc%' , would help if that column were indexed, as it would be able to seek directly to the records which start with 'abc' and ignore everything else. 在字符串的末尾加上通配符,比如'abc%'如果该列被索引,将会有所帮助,因为它可以直接查找以'abc'开头的记录并忽略其他所有内容。 Having the wild card at the beginning means it has to look at every row, regardless of indexing. 在开头使用外卡意味着它必须查看每一行,无论索引如何。

Good article here with more explanation. 好文章在这里有更多解释。

Only wildcards at the end of a Like character string will use an index. 只有Like字符串末尾的通配符才会使用索引。

You should look at using FTS Contains if you want to improve speed of wildcards at the front and back of a character string. 如果要提高字符串前面和后面的通配符速度,应该查看使用FTS Contains Also see this related SO post regarding Contains versus Like . 请参阅有关Contains vs. Like的相关SO帖子

From Microsoft it is more efficient to leave the closing wildcard because it can, if one exists, use an index rather than performing a scan. Microsoft离开结束通配符更有效,因为它可以(如果存在)使用索引而不是执行扫描。 Think about how the search might work, if you have no idea what's before it then you have to scan everything, but if you are only searching the tail end then you can order the rows and even possible (depending on what you're looking for) do a quasi-binary search. 想想搜索可能如何工作,如果你不知道它之前是什么,那么你必须扫描所有内容,但是如果你只搜索尾部那么你可以订购行甚至可能(取决于你要找的东西) )进行准二分搜索。

Some operators in joins or predicates tend to produce resource-intensive operations. 连接或谓词中的某些运算符往往会产生资源密集型操作。 The LIKE operator with a value enclosed in wildcards ("%a value%") almost always causes a table scan. 带有通配符(“%a value%”)的值的LIKE运算符几乎总是会导致表扫描。 This type of table scan is a very expensive operation because of the preceding wildcard. 由于前面的通配符,这种类型的表扫描是非常昂贵的操作。 LIKE operators with only the closing wildcard can use an index because the index is part of a B+ tree, and the index is traversed by matching the string value from left to right. 仅具有结束通配符的LIKE运算符可以使用索引,因为索引是B +树的一部分,并且通过从左到右匹配字符串值来遍历索引。

So, the above quote also explains why there was a huge processor spike when running two wildcards. 因此,上面的引用也解释了为什么在运行两个通配符时出现了巨大的处理器峰值。 It completed faster only by happenstance because there is enough horsepower to cover up the inefficiency. 它仅通过偶然事件更快地完成,因为有足够的马力来掩盖低效率。 When trying to determine performance on a query you want to look at the execution of the query rather than the resources of the server because those can be misleading. 在尝试确定查询的性能时,您希望查看查询的执行而不是服务器的资源,因为这些可能会产生误导。 If I have a server with enough horsepower to serve a weather vain and I'm running queries on tables as small as 500,000 rows the results are going to be misleading. 如果我有一台具有足够功率的服务器来满足天气的需求,并且我在小到500,000行的桌面上运行查询,结果将会产生误导。

Less the fact that Microsoft quoted your answer, when doing performance analysis, consider taking the dive into learning how to read the execution plan. 减少微软引用你的答案的事实,在进行性能分析时,考虑深入学习如何阅读执行计划。 It's an investment and very dry, but it will be worth it in the long run. 这是一项投资而且非常干燥,但从长远来看,它是值得的。

In short though, whoever was indicating that the trailing wildcard only is more efficient, is correct. 简而言之,无论谁表明尾随通配符只是更有效,都是正确的。

In MS SQL, if you want to have the names those are ending with 'ABC', then u can have the query like below(suppose table name is student ) 在MS SQL中,如果你想拥有以'ABC'结尾的名字,那么你可以得到如下的查询(假设表名是student

select * from  student where student_name like'%[ABC]'

so it will give those names which ends with 'A' ,'B','C'. 所以它会给那些以'A','B','C'结尾的名字。

2) if u want to have names which are starting with 'ABC' means- 2)如果您想要以'ABC'开头的名字,则表示 -

select * from student where student_name like '[ABC]%'

3) if u want to have names which in middle have 'ABC' 3)如果你想在中间有'ABC'的名字

select * from student where student_name like '%[ABC]%' 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM