简体   繁体   中英

Optimize MySQL Sub-Query

Is there a way this query can be optimized? It looks redundant:

SELECT
        SUM((SELECT 
            IFNULL(SUM(trx.totalAmount), 0) 
            FROM trx
            WHERE 
            FIND_IN_SET (trx.clientOrderId, "B6A8DB9568,6E7705B487,59C4D4234D,1D9CD4EF96,4C373E8CDE,E818BEE48F,6610555669,ECF388E288,32FD93075C,B03417425B,18FD77061A,1C39E4BD04,C92B970E55,0920F06DFA,EEFB4AAADA,FC2D9FF9AD") > 0
            AND trx.txnType IN ('REFUND', 'VOID')
        )) as refunds,

      SUM((SELECT 
        IFNULL(SUM(trx.totalAmount), 0) 
        FROM trx
        WHERE 
            FIND_IN_SET (trx.clientOrderId, "B6A8DB9568,6E7705B487,59C4D4234D,1D9CD4EF96,4C373E8CDE,E818BEE48F,6610555669,ECF388E288,32FD93075C,B03417425B,18FD77061A,1C39E4BD04,C92B970E55,0920F06DFA,EEFB4AAADA,FC2D9FF9AD") > 0 
            AND trx.txnType = 'SALE'
            AND trx.billingCycleNumber != 1
      )) AS lifetimeRevenue

Pleas note that this is just a part of the query and there are like 10 more of those on the original query so really needs to know if it can be optimized.

Thank guys.

The problem with using subqueries like that is that each subquery has to scan the full table. Also using FIND_IN_SET() the way you are using it forces a full table-scan even if you have indexes. So you are doing 12 full table-scans.

Here's a solution that does not use subqueries at all. It scans the table for the matching clientOrderId values once, to get a superset of all the rows that match any of the txTypes you need.

Then each sum of the totalAmount is conditional, if the txnType is one of certain types, otherwise use zero for each row's totalAmount, and zero contributes nothing to the sum, so it's as if you had skipped the rows with non-matching txnType.

SELECT
  SUM(IF(trx.txnType IN ('REFUND', 'VOID'), trx.totalAmount, 0)) AS refunds,
  SUM(IF(trx.txnType = 'SALE' AND trx.billingCycleNumber != 1, trx.totalAmount, 0)) AS lifetimeRevenue
FROM trx
WHERE trx.clientOrderId IN (
    'B6A8DB9568', '6E7705B487', '59C4D4234D', '1D9CD4EF96', 
    '4C373E8CDE', 'E818BEE48F', '6610555669', 'ECF388E288',
    '32FD93075C', 'B03417425B', '18FD77061A', '1C39E4BD04',
    'C92B970E55', '0920F06DFA', 'EEFB4AAADA', 'FC2D9FF9AD')
  AND trx.txnType IN ('REFUND', 'VOID', 'SALE');

You should have an index on (clientOrderId) for this query. Since you have two IN() predicates, the WHERE clause will only use the index for the first column in the index anyway.

Don't use a FIND_IN_SET() expression, because it won't use an index for the WHERE clause.

You said there are 10 more terms in the query. So I anticipate that there are some different types of expressions in those terms. I'm not going to answer any "but what if the next terms look like something different...". I have shown you the method of unraveling the subquery into one single-pass query. Applying it to other terms in your query is up to you.


Here's a demo I tested:

create table trx (
  clientOrderId char(10), 
  txnType enum('REFUND','VOID','SALE'), 
  totalAmount numeric(9,2), 
  billingCycleNumber int default 0,
  key (clientOrderId)
);

+---------------+---------+-------------+--------------------+
| clientOrderId | txnType | totalAmount | billingCycleNumber |
+---------------+---------+-------------+--------------------+
| B6A8DB9568    | REFUND  |       42.00 |                  0 |
| 59C4D4234D    | SALE    |       84.00 |                  0 |
+---------------+---------+-------------+--------------------+

Here's the EXPLAIN for your query:

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
|  1 | PRIMARY     | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | No tables used |
|  3 | SUBQUERY    | trx   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where    |
|  2 | SUBQUERY    | trx   | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    2 |    50.00 | Using where    |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+

Notice one subquery for each term, and each one does "type=All" as its table access.

Here's the EXPLAIN for my query:

+----+-------------+-------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key           | key_len | ref  | rows | filtered | Extra                              |
+----+-------------+-------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------------+
|  1 | SIMPLE      | trx   | NULL       | range | clientOrderId | clientOrderId | 11      | NULL |   16 |    50.00 | Using index condition; Using where |
+----+-------------+-------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------------+

One simple table access, using an index.

The result from both your query and my query given the example data I tried:

+---------+-----------------+
| refunds | lifetimeRevenue |
+---------+-----------------+
|   42.00 |           84.00 |
+---------+-----------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM