简体   繁体   中英

Optimizing MySQL Query - search condition on multiple columns

I am using MySQL 5.7.25 and this is the query I am trying to optimize:

SELECT a.contract,
       a.phone_number_1,
       a.phone_number_2,
       a.phone_number_3,
       a.phone_number_4,
       a.phone_number_5
  FROM tempdb.customer_crm a
 WHERE CHAR_LENGTH(a.contract) = 12
   AND (
         a.contract in (SELECT contract_final FROM tempdb.relevant_contracts)
         OR a.phone_number_1 in (SELECT phone_number FROM tempdb.relevant_numbers_1)
         OR a.phone_number_2 in (SELECT phone_number FROM tempdb.relevant_numbers_2)
         OR a.phone_number_3 in (SELECT phone_number FROM tempdb.relevant_numbers_3)
         OR a.phone_number_4 in (SELECT phone_number FROM tempdb.relevant_numbers_4)
         OR a.phone_number_5 in (SELECT phone_number FROM tempdb.relevant_numbers_5)
       );

customer_crm table has 5 different phone numbers in 5 columns. I need to filter all the records where any of the 5 phone numbers exists in table relevant_numbers . I have made 5 copies of table relevant_numbers as I can only use TEMPORARY tables (which cannot be opened more than once in MySQL). The number of records in:

  • customer_crm: 80 Million
  • relevant_numbers: 63 Thousand
  • relevant_contracts: 93 Thousand
  • Result of the query: 100 Thousand

This query takes too long. I have shaved off a few minutes using (phone number length condition):

SELECT a.contract,
       a.phone_number_1,
       a.phone_number_2,
       a.phone_number_3,
       a.phone_number_4,
       a.phone_number_5
  FROM tempdb.customer_crm a
 WHERE CHAR_LENGTH(a.contract) = 12
   AND (
         a.contract in (SELECT contract_final FROM tempdb.relevant_contracts)
         OR (CHAR_LENGTH(a.phone_number_1) > 9 AND a.phone_number_1 in (SELECT phone_number FROM tempdb.relevant_numbers_1))
         OR (CHAR_LENGTH(a.phone_number_2) > 9 AND a.phone_number_2 in (SELECT phone_number FROM tempdb.relevant_numbers_2))
         OR (CHAR_LENGTH(a.phone_number_3) > 9 AND a.phone_number_3 in (SELECT phone_number FROM tempdb.relevant_numbers_3))
         OR (CHAR_LENGTH(a.phone_number_4) > 9 AND a.phone_number_4 in (SELECT phone_number FROM tempdb.relevant_numbers_4))
         OR (CHAR_LENGTH(a.phone_number_5) > 9 AND a.phone_number_5 in (SELECT phone_number FROM tempdb.relevant_numbers_5))
       );

It still takes about 10 minutes. I have tried using EXISTS condition instead of IN and it takes even longer. I have also tried using left join which also takes longer. All the columns are individually indexed.

Any help will be appreciated. Thanks.

customer_crm table has 5 different phone numbers in 5 columns. I need to filter all the records where any of the 5 phone numbers exists in table relevant_numbers.

Instead of checking individually each phone number in relevant_numbers , why not use exists with an in condition?

select c.*
from tempdb.customer_crm c
where 
    exists (
        select 1
        from tempdb.relevant_contracts o
        where o.contract_final = c.contract 
    )
    or exists (
        select 1
        from tempdb.relevant_numbers n
        where n.phone_number in (
            c.phone_number_1,
            c.phone_number_2,
            c.phone_number_3,
            c.phone_number_4,
            c.phone_number_5
        )
    )

For performance, you can try the following indexes:

customer_crm(
    contract, 
    phone_number_1,
    phone_number_2,
    phone_number_3,
    phone_number_4,
    phone_number_5
)
relevant_contracts(contract_final)
relevant_numbers (phone_number)

I am also unsure that the checks on the length of contract is beneficial: using a function here makes the query non SARGable (ie prevents the use of an index).

OR is a performance killer. So is IN ( SELECT ... ) .

The query as it stands is going to do a full table scan of 80M rows, and do lookups into the temp tables. Those secondary lookups will be only 1 row if you go to the effort of indexing your temp tables, or 63K rows otherwise -- That would add up to 25 trillion lookups. It might finish this year.

Plan A: Turn OR into UNION :

    (  SELECT  cc.id
            FROM  tempdb.customer_crm AS cc
            JOIN  tempdb.relevant_contracts AS rc
            WHERE  cc.contract = rc.contract 
    )  UNION  
    (  SELECT  cc.id
            FROM  tempdb.customer_crm AS cc
            JOIN  tempdb.relevant_numbers_1 AS rn
            WHERE  cc.phone_number_1 = rn.phone_number 
    )  UNION
    (  SELECT  cc.id
            FROM  tempdb.customer_crm AS cc
            JOIN  tempdb.relevant_numbers_2 AS rn
            WHERE  cc.phone_number_2 = rn.phone_number 
    )  UNION
    (  SELECT  cc.id
            FROM  tempdb.customer_crm AS cc
            JOIN  tempdb.relevant_numbers_3 AS rn
            WHERE  cc.phone_number_3 = rn.phone_number 
    )  UNION  
    (  SELECT  cc.id
            FROM  tempdb.customer_crm AS cc
            JOIN  tempdb.relevant_numbers_4 AS rn
            WHERE  cc.phone_number_4 = rn.phone_number 
    )  UNION  
    (  SELECT  cc.id
            FROM  tempdb.customer_crm AS cc
            JOIN  tempdb.relevant_numbers_5 AS rn
            WHERE  cc.phone_number_5 = rn.phone_number 
    )

I am assuming that id is the PRIMARY KEY of customer_crm . You will need these indexes on customer_crm :

INDEX(contract, id)
INDEX(phone_number_1, id)
INDEX(phone_number_2, id)
INDEX(phone_number_3, id)
INDEX(phone_number_4, id)
INDEX(phone_number_5, id)

Use the above query as a subquery, JOIN that back to customer_crm to get whatever columns you really need.

That will be on the order of 1 million actions -- much less.

The check for length=12 can come later as a minor annoyance.

Plan B: Don't use 5 columns.

It is usually a bad schema design to have an array of things spread across multiple columns or packed together in a single column. Instead, have another table with (at least) 2 columns: the number and the id to join back to the main table.

With INDEX(number) , it won't matter that it has 5*80M rows.

Plan C: Would you care to back up to before creating the temp tables; other optmizations may be possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM