简体   繁体   中英

What is the best optimization technique for a wildcard search through 100,000 records in sql table

I am working on an ASP.NET MVC application. This application is used by 200 users. These users constantly (every 5 mins) search for an item from the list of 100,000 items (this list is going to increase every month by 1-2 %). This list of 100,000 items are stored in a SQL Server table.

The search is a wildcard search

eg:

Select itemCode, itemName, ItemDesc 
from tblItems
Where itemName like '%SearchWord%'

The searching needs to really fast since the main business relies on searching and selecting the item.

I would like to know how to get the best performance. The search results have to come up instantaneously.

What I have tried -

  1. I tried pre-loading the entire 100,000 records into memcache and then reading from the memcache. I was trying to avoid the calls to SQL Server for every search.

    This takes a lot of time. Every time user searches for an item, we are retrieving 100,000 records from the memcache and then doing the search. This is taking almost 2-3 times more time than direct SQL searches.

  2. I tried doing a direct search on the SQL Server table but limiting the results to only 50 records at a time (using top 50)

    This seems to be Ok but still no-where near the performance we are seeking

I would like to hear the possible solutions and links to any articles/code.

Thanks in advance

Run SQL Profiler and do a tuning profile. This will make recommendations on indexes to execute against your database.

Also, a query such as the following would be worth a try.

SELECT  *
FROM    
( 
    SELECT    ROW_NUMBER() OVER ( ORDER BY ColumnA) AS RowNumber, itemCode, itemName, ItemDesc
    FROM      tblItems
    WHERE     itemName LIKE '%FooBar%'
) AS RowResults
WHERE   RowNumber >= 1 AND RowNumber < 50
ORDER BY RowNumber

EDIT: Updated query to reflect your real scenario.

How about having a search without the leading wildcard as your primary search....

Where itemName like 'SearchWord%'

and then have having a "More Results" button that loads

Where itemName like '%SearchWord%'

(alternatively exclude results from the first result set)

Where itemName not like 'SearchWord%' and itemName like '%SearchWord%'

A weird alternative which might work, as it depends on several assumptions etc. Sorry not fully explained but am using ipad so hard to type. (and yes, this solution has been used in high txn commericial systems)

This assumes

  1. That your query is cpu constrained not IO
  2. That itemName is not too long, such that it holds all letters and numbers
  3. That searchword, in total, contains enough selective characters and isnt just highly common characters
  4. Your selection predicates are constrained by a %like%

The basic idea is to expand your query to help the optimiser know which rows need the like scanning.

Step 1. Setup your table

Create an additional 26 or 36 columns for each letter/digit. When I've done this for real it has always been a seperate table, but putting it on source table should be ok for a small volume like 100k. Lets call the colmns trig_a, trig_b etc.

Create a trigger for each insert/edit/delete and put a 1 or 0 into the trig_a field if it contains an 'a', do this for all 26/36 columns. The trigger to do this is complex, but possible (at least using Oracle). If you get stuck I'm sure SO'ers can create it, or I can dig it out.

At this point, we have a series of columns that indicate whether a field contains a letter/digit etc.

Step 2. Helping you query

With this extra info, we are in the position to help the optimiser. Add the following to your query

Select ... Where .... And
 ((trig_a > 0) or (searchword not like '%a%')) and
 ((trig_b > 0) or (searchword not like '%b%')) and
   ... Repeat for all columns monitored...

If the optimiser behaves, it can use the (hopefully) lower cost field>0 predicates to reduce the like predicates evaluated.

Notes.

  1. You may need to force the optimiser to scan trig_? Fields first
  2. Indexes can help on trig_? Fields, especically if in the source table
  3. I haven't shown how to handle upper/lower case, dont forget to handle this
  4. You might find just doing a few letters is all you need to do.
  5. This technique doesnt offer performance gains for every use of like, so it isnt a general purpose technique for everywhere you use a like.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM