优化mysql查询以获得更好的性能

Question

I have following query 我有以下查询

SELECT o.order_id,
       p.pre_sale_phone_manual_id AS id,
       p.created,
       p.user_id
FROM `order` o
LEFT JOIN `customer` c ON c.customer_id = o.customer_id,
                          `pre_sale_phone_manual` p
LEFT JOIN `pre_sale_phone_manual` p1 ON p.pre_sale_phone_manual_id=p1.pre_sale_phone_manual_id
AND p.created > p1.created
WHERE p1.user_id IS NULL
  AND p.phone <> ""
  AND REPLACE(REPLACE(REPLACE(REPLACE(c.phone, "-", ""), ".", ""), "+", ""), " ", "") LIKE CONCAT('%', RIGHT(REPLACE(REPLACE(REPLACE(REPLACE(p.phone, "-", ""), ".", ""), "+", ""), " ", ""), 10))
  AND o.created > p.created
  AND o.created < (DATE_ADD(p.created, INTERVAL 183 DAY))
  AND o.created > '2013-12-30 08:28:37'

The query basically does is matching the phone numbers of customer's and entry in pre_sale_phone_manual tables. 该查询基本上所做的是匹配客户的电话号码和pre_sale_phone_manual表中的条目。 The pre_sale_phone_manual's record should be before order's date and should be within 6 months (183 days) and should match with the pre_sale_phone_manual table's first entry because there can be duplicate entries by other users. pre_sale_phone_manual的记录应在订单日期之前，并且应在6个月（183天）之内，并且应与pre_sale_phone_manual表的第一个条目匹配，因为其他用户可以重复输入。

As I've found the slowness is in the join between order table and pre_sale_phone_manual table due to there is no 1 to 1 join and scans the whole tables and obviously for INTERVAL 183 DAY 我发现订单表和pre_sale_phone_manual表之间的连接速度很慢，因为没有一对一的连接并扫描整个表，显然是INTERVAL 183 DAY

Following is the EXPLAIN for query 以下是查询的解释

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: o
         type: ALL
possible_keys: order_created_index,fk_order_customer
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 110658
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: p
         type: ALL
possible_keys: created,phone
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2053
        Extra: Using where; Using join buffer
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: p1
         type: eq_ref
possible_keys: PRIMARY,created
          key: PRIMARY
      key_len: 4
          ref: 463832_yii_adm_t4f.p.pre_sale_phone_manual_id
         rows: 1
        Extra: Using where; Not exists
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: c
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: 463832_yii_adm_t4f.o.customer_id
         rows: 1
        Extra: Using where

Following stats are from mysql slow query log 以下统计信息来自mysql慢查询日志

Query_time: 126.038395  Lock_time: 0.000303 Rows_sent: 72  Rows_examined: 15266616

Following fields are indexed already, 以下字段已被索引，

order.created
pre_sale_phone_manual.created
pre_sale_phone_manual.phone
and PKs and FKs with _id suffix

Please help for optimizing the query and thanks for your time. 请帮助优化查询，并感谢您的宝贵时间。

Answer 1

There are a few performance "killers": 有一些表现“杀手“”：

The Cartesian product of num-rows-of( customer ) * num-rows-of( pre_sale_phone_manual ) num-rows-of（ customer ）* num-rows-of（ pre_sale_phone_manual ）的笛卡尔积
then the inefficient method matching of c.phone to p.phone 则c.phone与p.phone的无效方法匹配
Trying to locate the first record per phone in pre_sale_phone_manual using left join 尝试使用左pre_sale_phone_manual在pre_sale_phone_manual查找每个电话的第一条记录

(Are you trying to find the first record in pre_sale_phone_manual for each phone? I think it's what the code is doing so I have assumed this is the case.) （您是否要在pre_sale_phone_manual查找每个电话的第一条记录？我认为这是代码正在执行的操作，因此我假设是这种情况。）

I can't easily solve item 2. it seems your phone columns can't be trusted 100%, but if this problem was solved the query (I think) might be: 我无法轻松解决第2项。似乎您的电话栏无法100％受到信任，但是如果解决了此问题，查询（我认为）可能是：

SELECT
      o.order_id
    , p.pre_sale_phone_manual_id AS id
    , p.created
    , p.user_id
FROM `order` o
      INNER JOIN `customer` c
            ON c.customer_id = o.customer_id
      INNER JOIN (
            SELECT
                  pspm.pre_sale_phone_manual_id AS id
                , pspm.created
                , pspm.user_id
                , pspm.phone
            FROM `pre_sale_phone_manual` pspm
                  INNER JOIN (
                        SELECT
                              phone
                            , MIN(created) AS created
                        FROM `pre_sale_phone_manual`
                        GROUP BY
                              phone
                  ) dc
                        ON pspm.created = dc.created 
                        AND pspm.phone = dc.phone
      ) p
            ON c.phone = p.phone /* see notes on this join */
WHERE o.created > p.created
      AND o.created < DATE_ADD(p.created, INTERVAL 183 DAY)
      AND o.created > '2013-12-30 08:28:37'

notes on the phone = phone join (untrustworthy phone columns) 电话上的注释=电话加入（不可信任的电话栏）

Not a lot a query developer can do unless they also have control over the tables. 除非他们也可以控制表，否则查询开发人员无法做很多事情。 One method would be to add columns that ARE reliable and index those new columns . 一种方法是添加可靠的列并为这些新列建立索引 。 MySQL does not have function based indexes or computed columns that I'm aware of, so how you arrive at reliable data is not simple. MySQL没有我所知道的基于函数的索引或计算列，因此如何获得可靠的数据并不简单。

This previous question holds a function that may be useful, for example if you added good_phone to customer 上一个问题拥有一个可能有用的功能，例如，如果您向客户添加了good_phone

 /*
Function From user1467716
https://stackoverflow.com/questions/287105/mysql-strip-non-numeric-characters-to-compare
*/


CREATE FUNCTION STRIP_NON_DIGIT(input VARCHAR(255))
   RETURNS VARCHAR(255)
BEGIN
   DECLARE output   VARCHAR(255) DEFAULT '';
   DECLARE iterator INT          DEFAULT 1;
   WHILE iterator < (LENGTH(input) + 1) DO
      IF SUBSTRING(input, iterator, 1) IN ( '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' ) THEN
         SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
   END WHILE;
   RETURN output;
END
//

update customer
set good_phone = strip_non_digit(InputPhone)
;
//

If you aren't able to solve the unreliable phone data then you suffer the performance that implies and instead of "phone = phone" you will need to continue with: 如果您无法解决不可靠的电话数据，那么您将遭受暗示的性能问题，而不是“ phone = phone”，您将需要继续：

AND REPLACE(REPLACE(REPLACE(REPLACE(c.phone, "-", ""), ".", ""), "+", ""), " ", "") etc. AND REPLACE（REPLACE（REPLACE（REPLACE（c.phone，“-”，“”）），“。”，“”），“ +”，“”），“”，“））等

Answer 2

So, just to repeat what others and myself have already written: 因此，仅重复别人和我自己已经写的内容：

You are actually doing an expensive CROSS JOIN with pre_sale_phone_manual_id . 实际上，您正在使用pre_sale_phone_manual_id进行昂贵的CROSS JOIN 。 All rows on the left side combined with all rows on the right side. 左侧的所有行与右侧的所有行合并。 That's a bunch of rows. 那是一排排。
Despite the LEFT JOIN on customer , you are in fact already doing an INNER JOIN , due to a WHERE condition (the LIKE condition). 尽管有LEFT JOIN customer ，但由于WHERE条件（ LIKE条件），实际上您已经在进行INNER JOIN 。
You would benefit from normalizing phone numbers. 您将从标准化电话号码中受益。
LIKE conditions do fully not benefit from indexes if the condition starts with a wildcard ' % '. 如果条件以通配符' % '开头，则LIKE条件不会完全受益于索引。 (It can benifit to some extent if the index is small enough to fit in PM, since the index scan will be quicker. But it will still be O(n) rather than O(log(n))) （如果索引足够小以适合PM，则可以在某种程度上受益，因为索引扫描会更快。但是仍然是O（n）而不是O（log（n）））

I have made a trivial, obviously untested, rewrite under the assumption that the OUTER JOIN s and the CROSS JOIN are not required, ie that you always have a record in pre_sale_phone_manual_id . 在不需要OUTER JOIN和CROSS JOIN的假设下，我做了一个琐碎的，显然未经测试的重写，即，您始终在pre_sale_phone_manual_id有一个记录。 You could try it out if the assumption is valid. 如果假设有效，您可以尝试一下。

SELECT o.order_id,
       p.pre_sale_phone_manual_id AS id,
       p.created,
       p.user_id
FROM `order` o
JOIN `customer` c ON c.customer_id = o.customer_id,
JOIN `pre_sale_phone_manual` p
LEFT JOIN `pre_sale_phone_manual` p1 
    ON p.pre_sale_phone_manual_id=p1.pre_sale_phone_manual_id
    AND p.created > p1.created
WHERE p1.user_id IS NULL
  AND p.phone <> ""
  AND REPLACE(REPLACE(REPLACE(REPLACE(c.phone, "-", ""), ".", ""), "+", ""), " ", "") 
      LIKE CONCAT('%', RIGHT(REPLACE(REPLACE(REPLACE(REPLACE(p.phone, "-", ""), ".", ""), "+", ""), " ", ""), 10))
  AND o.created > p.created
  AND o.created < (DATE_ADD(p.created, INTERVAL 183 DAY))
  AND o.created > '2013-12-30 08:28:37'

So, traditionally we prefer JOIN s in MySQL due to performance issues in older versions. 因此，传统上，由于旧版本中的性能问题，我们更喜欢在MySQL中使用JOIN 。 However, you could also try and see what happens if you use NOT EXISTS (...) instead of LEFT JOIN ... p1 . 但是，您也可以尝试看看如果使用NOT EXISTS (...)而不是LEFT JOIN ... p1会发生什么。

Answer 3

First thing is that you have mixed implicit and explicit joins. 第一件事是您混合了隐式和显式联接。 Just for readability use an explicit INNER JOIN for pre_sale_phone_manual. 仅出于可读性考虑，对pre_sale_phone_manual使用显式的INNER JOIN。 This also should be done with an ON clause. 这也应该使用ON子句来完成。

Further you refer to columns from customer in the WHERE clause which seems to render the left join of customers irrelevant. 进一步，您在WHERE子句中引用了customer的列，这似乎使customers的左联接无关紧要。 Change this to an inner join as well. 也将其更改为内部联接。

However this is still not going to be quick. 但是，这仍然不会很快。 Your join of pre_sale_phone_manual and order is using DATE_ADD which is going to force a calculation on a field and likely prevent any useful use of an index on that join. 您的pre_sale_phone_manual和order联接使用的是DATE_ADD，这将强制对字段进行计算，并且可能会阻止对该联接进行任何有用的索引使用。

The same applies to the check of the phone field on the customer and pre_sale_phone_manual tables (especially as you use a leading wildcard on the LIKE you use). 这同样适用于对customer表和pre_sale_phone_manual表上的phone字段的检查（尤其是在您使用的LIKE上使用前导通配符时）。

How many records are there on pre_sale_phone_manual for each resulting row? pre_sale_phone_manual上每个结果行有几条记录？ If a large number it might be worth using a sub query to exclude all but the latest one. 如果数量很大，可能值得使用子查询来排除除最新查询以外的所有查询。

SELECT o.order_id,
       p.pre_sale_phone_manual_id AS id,
       p.created,
       p.user_id
FROM `order` o
INNER JOIN 
(   
    SELECT pre_sale_phone_manual_id, MAX(created) AS max_created
    FROM `pre_sale_phone_manual`
    GROUP BY pre_sale_phone_manual_id
) p_sub
ON o.created > p_sub.max_created AND o.created < (DATE_ADD(p_sub.max_created, INTERVAL 183 DAY))
INNER JOIN pre_sale_phone_manual p
ON p.pre_sale_phone_manual_id =  p_sub.pre_sale_phone_manual_id
AND p.created =  p_sub.max_created 
INNER JOIN `customer` c ON c.customer_id = o.customer_id
WHERE p.phone <> ""
  AND REPLACE(REPLACE(REPLACE(REPLACE(c.phone, "-", ""), ".", ""), "+", ""), " ", "") LIKE CONCAT('%', RIGHT(REPLACE(REPLACE(REPLACE(REPLACE(p.phone, "-", ""), ".", ""), "+", ""), " ", ""), 10))
  AND o.created > '2013-12-30 08:28:37'

Answer 4

Tuning is hard, when one doesn't have the exact data to play with. 当没有确切的数据可玩时，调音很困难。 But anyway ... 但无论如何 ...

You have a weird looking self join on pre_sale_phone_manual on the same coloumn on both sides(!?). 您在pre_sale_phone_manual的同一列中的pre_sale_phone_manual上有一个看起来很奇怪的自我连接（！？）。 This looks somewhat like a mistake. 这看起来有点像个错误。 Anyway Mysql supports analytic functions , and I think your self join can be transformed to a single table access using those. 无论如何，Mysql支持分析功能，我认为您可以使用这些功能将自我联接转换为单个表访问。
Others have already noticed that the like condition on denormalized phone numbers gonna hurt. 其他人已经注意到，非规范化电话号码上的类似情况可能会造成伤害。 I'd suggest the following: add a column INVERSE_PHONE on p and c which cotains the phone number, but normalized as needed in your select and from back to front ( maintain it using triggers ). 我建议以下内容：在p和c上添加一列INVERSE_PHONE ，其中包含电话号码，但根据需要在选择中并从头到尾进行规范化（使用触发器进行维护）。 Put an index that column on p and use it in the where clause. 在p上将该列索引，并在where子句中使用它。 This basically replaces a function based index which it seems where planned for MySql , but are gone with traces as far as I can tell. 这基本上替换了基于函数的索引，该索引似乎在MySql的计划位置，但据我所知，它已经消失了。
If this still doesn't do the trick, do the same for the (DATE_ADD(p.created, INTERVAL 183 DAY)) and put all the columns of p in a single index that get used in the select. 如果仍然不能解决问题，请对(DATE_ADD(p.created, INTERVAL 183 DAY))然后将p所有列放入一个索引中，以供选择。 Beginning with the most selective column. 从最有选择性的专栏开始。
all the conditions that have one table on one side and a different one on the other side are part of the join, so put them in the join condition and not in the where clause. 在一侧具有一个表而在另一侧具有另一个表的所有条件都是联接的一部分，因此请将它们置于联接条件而不是where子句中。 This hopefully has no effect on performance, but it makes the statement easier to read. 希望这对性能没有影响，但是使该语句更易于阅读。

Answer 5

I am more familiar with Oracle, but what about indexes? 我对Oracle更熟悉，但是索引呢？ They can speed up queries a lot and avoid full-scans of tables, especially at left outer joins. 它们可以大大加快查询速度，并避免对表进行全面扫描，尤其是在左外部联接处。 From the explain-output I see that there are no such indexes used. 从说明输出中，我看到没有使用这样的索引。

Try to place smart indexes. 尝试放置智能索引。 Again, I worked with Oracle, but I think mySQL should also place indexes on primary and foreign keys. 同样，我使用Oracle，但是我认为mySQL还应该在主键和外键上放置索引。

优化mysql查询以获得更好的性能

问题描述

5 个解决方案

解决方案1
3 2014-07-07 06:24:31

解决方案2
1 2014-07-06 14:12:36

解决方案3
1 2014-07-09 09:19:42

解决方案4
1 2014-07-11 10:58:32

解决方案5
0 2014-07-01 07:15:15

优化mysql查询以获得更好的性能

问题描述

5 个解决方案

解决方案1 3 2014-07-07 06:24:31

解决方案2 1 2014-07-06 14:12:36

解决方案3 1 2014-07-09 09:19:42

解决方案4 1 2014-07-11 10:58:32

解决方案5 0 2014-07-01 07:15:15

解决方案1
3 2014-07-07 06:24:31

解决方案2
1 2014-07-06 14:12:36

解决方案3
1 2014-07-09 09:19:42

解决方案4
1 2014-07-11 10:58:32

解决方案5
0 2014-07-01 07:15:15