简体   繁体   中英

Optimizing a Mysql search query

I have a search query which I'm trying to optimize. I'm pretty new to mysql so can someone explain how to optimize this type of query with multiple joins?

SELECT cust.*, br.branchcode, br.branchname, over.branchcode override_branchcode, over.branchname override_branchname
                    FROM ( SELECT id, CONCAT( firstName, ' ', lastName ) fullName, firstname, lastname, phone1, phone2, mobile1, mobile2, unit, brgy, city, `primary`, override_pst
                    FROM sl_customers ) cust
                    LEFT JOIN sl_branches br ON cust.primary = br.id
                    LEFT JOIN sl_branches over ON cust.override_pst = over.id
                    WHERE fullName LIKE '{$searchtext}' OR firstname LIKE '%{$searchtext}%' OR lastname LIKE '%{$searchtext}%'

For some reason it's running awfully slow and I'm not sure to begin cutting the fat.

Even if you have proper indexes on first_name and last_name , once you CONCAT them they're meaningless.

An approach I've had good results (across millions of records) is a combination of application logic and SQL. Assuming that the full name would always be connected together with a space, you can split the search text (at app level) by its spaces. Depending on how many spaces there are in the search text will determine what sort of query you execute.

Firstly, add an index across both columns eg.

ALTER TABLE `sl_customers` ADD INDEX idx_name_search (`first_name`,`last_name`);

Then, make all permutations of space-delimited names. Here's a working php example:

$search_text = 'millhouse van houten';
$conditions = '';

$parts = explode(' ', $search_text);

for($i=count($parts); $i>=0; $i--){
    $params[] = implode(' ', array_slice($parts, 0, $i)).'%'; //first name
    $params[] = implode(' ', array_slice($parts, $i)).'%'; //last anme

    $conditions .= '(`first_name` LIKE ? AND `last_name` LIKE ?) OR ';
}
$conditions = substr($conditions, 0, -4); //trim the last OR

$query = 'SELECT `first_name`, `last_name` FROM `customer` WHERE '.$conditions;

You end up with a query like:

SELECT `first_name`, `last_name` FROM `customer` WHERE 
(`first_name` LIKE ? AND `last_name` LIKE ?) OR 
(`first_name` LIKE ? AND `last_name` LIKE ?) OR 
(`first_name` LIKE ? AND `last_name` LIKE ?) OR 
(`first_name` LIKE ? AND `last_name` LIKE ?);

and parameters like

[0] => millhouse van houten%
[1] => %
[2] => millhouse van%
[3] => houten%
[4] => millhouse%
[5] => van houten%
[6] => %
[7] => millhouse van houten%

This will search for a set of combinations like this:

first_name             | last_name
-------------------------------------------------
millhouse van houten%  | %
millhouse van%         | houten%
millhouse%             | van houten%
%                      | millhouse van houten%

Bear in mind that in most cases, there will only actually be one space in the full name, so there will be fewer comparisons than in my example.

You may want to have a play with the wildcards but as long as you leave an index on ( first_name , last_name ) AND last_name , you'll always be using an index effectively. Having wildcards at the start of the LIKE comparison will stop any indexes being used.

Sorry for the lengthy answer - I just wanted to make the idea as clear as possible.

Names are something people expect to be able to search on, and do so efficiently.

Skip the hokey concatenation & maintain a proper "full name" column in your table. Put an index on that, and even partial matches can run efficiently just by an index scan.. At the moment, you're spitting in the query engine's face by giving it calculated expressions which it can never optimize.

Once you can match a partial in FULL_NAME, you shouldn't need to even bother with separate OR clauses on FIRST or LAST. (ORs are inefficient, by the way.)

And as Michael says, write the structure of your query properly. CUSTOMER is most simply a join, not a subquery.

select CUST.*, BR.*, OVER.*            -- you can put in the specific columns.
from SL_CUSTOMERS CUST
join SL_BRANCHES BR on cust.primary = br.id
join SL_BRANCHES OVER on cust.override_pst = over.id
where CUST.FULL_NAME like '%{$searchtext}%';

Give the poor MySQL optimizer something it can actually index & work with effectively, and it will almost certainly give you decent performance.

See: http://kristiannielsen.livejournal.com/802.html

One big problem with performance of your query is the inline view (aliased as cust). MySQL calls it a "derived table", which is an apt name, because of how MySQL handles that. MySQL runs that query, and stores the result as a temporary MyISAM table, and the the outer query runs on that. Because there are no predicates in that view query, MySQL is essentially

creating a copy of the customers table each time the query is run.

It would be much better, from a performance standpoint, to move the search predicates from the outer query, into the query in the inline view:

SELECT cust.*
     , br.branchcode
     , br.branchname
     , over.branchcode override_branchcode
     , over.branchname override_branchname
  FROM ( SELECT s.id
              , CONCAT(s.firstName,' ',s.lastName) fullName
              , s.firstname
              , s.lastname
              , s.phone1
              , s.phone2
              , s.mobile1
              , s.mobile2
              , s.unit
              , s.brgy
              , s.city
              , s.primary
              , s.override_pst
           FROM sl_customers s
          WHERE CONCAT(s.firstName,' ',s.lastName) LIKE '{$searchtext}'
             OR s.firstname LIKE '%{$searchtext}%'
             OR s.lastname  LIKE '%{$searchtext}%'
       ) cust
  LEFT 
  JOIN sl_branches br
    ON cust.primary = br.id
  LEFT
  JOIN sl_branches over 
    ON cust.override_pst = over.id

At least that would likely be a smaller number of rows to copy into the "derived table", though MySQL still has to materialize that view query, and then run another query on that.

To improve performance better, we can eliminate the inline view entirely:

SELECT s.id
     , CONCAT(s.firstName,' ',s.lastName) fullName
     , s.firstname
     , s.lastname
     , s.phone1
     , s.phone2
     , s.mobile1
     , s.mobile2
     , s.unit
     , s.brgy
     , s.city
     , s.primary
     , s.override_pst
     , br.branchcode
     , br.branchname
     , over.branchcode override_branchcode
     , over.branchname override_branchname
  FROM sl_customers s           
  LEFT 
  JOIN sl_branches br
    ON cust.primary = br.id
  LEFT
  JOIN sl_branches over 
    ON cust.override_pst = over.id
 WHERE CONCAT(s.firstName,' ',s.lastName) LIKE '{$searchtext}'
    OR s.firstname LIKE '%{$searchtext}%'
    OR s.lastname  LIKE '%{$searchtext}%'

The next "big rock" in terms of performance is that none of the predicates are sargable. That is, MySQL can't make use of a range scan on any of those LIKE predicates (because of the leading '%' in the case of the columns., and because the CONCAT expression has to be evaluated for every row.

A full table scan is likely the fastest you are going to get with this query. You might be able to get MySQL to make use of an index ON cust (firstname,lastname) , but that's not likely to improve performance if the table and index are in memory, and/or only a small subset of rows from the table need to be accessed (due to the way the blocks from the underlying table are accessed from an index lookup, with slower random reads.)

When searchtext is an empty string, then the full scan is likely going to be the fastest.

With searchtext that doesn't match any rows, then a full index scan is likely going to be faster.

You would really have to test the performance.

(It's likely you already have indexes on the id columns of the other two tables, since the id column is likely the PRIMARY KEY for those tables. If that's not the case, then you definitely want to have an index defined on those tables, with id as the leading column, to improve join performance.)

Put the word EXPLAIN in front of it and then evaluate the results. You'll be looking for the field indexes that are very large, causing the query to take longer. Optimize those indexes by making some new keys.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM