I am using MySQL 5.7.25 and this is the query I am trying to optimize:
SELECT a.contract,
a.phone_number_1,
a.phone_number_2,
a.phone_number_3,
a.phone_number_4,
a.phone_number_5
FROM tempdb.customer_crm a
WHERE CHAR_LENGTH(a.contract) = 12
AND (
a.contract in (SELECT contract_final FROM tempdb.relevant_contracts)
OR a.phone_number_1 in (SELECT phone_number FROM tempdb.relevant_numbers_1)
OR a.phone_number_2 in (SELECT phone_number FROM tempdb.relevant_numbers_2)
OR a.phone_number_3 in (SELECT phone_number FROM tempdb.relevant_numbers_3)
OR a.phone_number_4 in (SELECT phone_number FROM tempdb.relevant_numbers_4)
OR a.phone_number_5 in (SELECT phone_number FROM tempdb.relevant_numbers_5)
);
customer_crm table has 5 different phone numbers in 5 columns. I need to filter all the records where any of the 5 phone numbers exists in table relevant_numbers . I have made 5 copies of table relevant_numbers as I can only use TEMPORARY tables (which cannot be opened more than once in MySQL). The number of records in:
This query takes too long. I have shaved off a few minutes using (phone number length condition):
SELECT a.contract,
a.phone_number_1,
a.phone_number_2,
a.phone_number_3,
a.phone_number_4,
a.phone_number_5
FROM tempdb.customer_crm a
WHERE CHAR_LENGTH(a.contract) = 12
AND (
a.contract in (SELECT contract_final FROM tempdb.relevant_contracts)
OR (CHAR_LENGTH(a.phone_number_1) > 9 AND a.phone_number_1 in (SELECT phone_number FROM tempdb.relevant_numbers_1))
OR (CHAR_LENGTH(a.phone_number_2) > 9 AND a.phone_number_2 in (SELECT phone_number FROM tempdb.relevant_numbers_2))
OR (CHAR_LENGTH(a.phone_number_3) > 9 AND a.phone_number_3 in (SELECT phone_number FROM tempdb.relevant_numbers_3))
OR (CHAR_LENGTH(a.phone_number_4) > 9 AND a.phone_number_4 in (SELECT phone_number FROM tempdb.relevant_numbers_4))
OR (CHAR_LENGTH(a.phone_number_5) > 9 AND a.phone_number_5 in (SELECT phone_number FROM tempdb.relevant_numbers_5))
);
It still takes about 10 minutes. I have tried using EXISTS condition instead of IN and it takes even longer. I have also tried using left join which also takes longer. All the columns are individually indexed.
Any help will be appreciated. Thanks.
customer_crm table has 5 different phone numbers in 5 columns. I need to filter all the records where any of the 5 phone numbers exists in table relevant_numbers.
Instead of checking individually each phone number in relevant_numbers
, why not use exists
with an in
condition?
select c.*
from tempdb.customer_crm c
where
exists (
select 1
from tempdb.relevant_contracts o
where o.contract_final = c.contract
)
or exists (
select 1
from tempdb.relevant_numbers n
where n.phone_number in (
c.phone_number_1,
c.phone_number_2,
c.phone_number_3,
c.phone_number_4,
c.phone_number_5
)
)
For performance, you can try the following indexes:
customer_crm(
contract,
phone_number_1,
phone_number_2,
phone_number_3,
phone_number_4,
phone_number_5
)
relevant_contracts(contract_final)
relevant_numbers (phone_number)
I am also unsure that the checks on the length of contract
is beneficial: using a function here makes the query non SARGable (ie prevents the use of an index).
OR
is a performance killer. So is IN ( SELECT ... )
.
The query as it stands is going to do a full table scan of 80M rows, and do lookups into the temp tables. Those secondary lookups will be only 1 row if you go to the effort of indexing your temp tables, or 63K rows otherwise -- That would add up to 25 trillion lookups. It might finish this year.
Plan A: Turn OR
into UNION
:
( SELECT cc.id
FROM tempdb.customer_crm AS cc
JOIN tempdb.relevant_contracts AS rc
WHERE cc.contract = rc.contract
) UNION
( SELECT cc.id
FROM tempdb.customer_crm AS cc
JOIN tempdb.relevant_numbers_1 AS rn
WHERE cc.phone_number_1 = rn.phone_number
) UNION
( SELECT cc.id
FROM tempdb.customer_crm AS cc
JOIN tempdb.relevant_numbers_2 AS rn
WHERE cc.phone_number_2 = rn.phone_number
) UNION
( SELECT cc.id
FROM tempdb.customer_crm AS cc
JOIN tempdb.relevant_numbers_3 AS rn
WHERE cc.phone_number_3 = rn.phone_number
) UNION
( SELECT cc.id
FROM tempdb.customer_crm AS cc
JOIN tempdb.relevant_numbers_4 AS rn
WHERE cc.phone_number_4 = rn.phone_number
) UNION
( SELECT cc.id
FROM tempdb.customer_crm AS cc
JOIN tempdb.relevant_numbers_5 AS rn
WHERE cc.phone_number_5 = rn.phone_number
)
I am assuming that id
is the PRIMARY KEY
of customer_crm
. You will need these indexes on customer_crm
:
INDEX(contract, id)
INDEX(phone_number_1, id)
INDEX(phone_number_2, id)
INDEX(phone_number_3, id)
INDEX(phone_number_4, id)
INDEX(phone_number_5, id)
Use the above query as a subquery, JOIN
that back to customer_crm
to get whatever columns you really need.
That will be on the order of 1 million actions -- much less.
The check for length=12 can come later as a minor annoyance.
Plan B: Don't use 5 columns.
It is usually a bad schema design to have an array of things spread across multiple columns or packed together in a single column. Instead, have another table with (at least) 2 columns: the number
and the id
to join back to the main table.
With INDEX(number)
, it won't matter that it has 5*80M rows.
Plan C: Would you care to back up to before creating the temp tables; other optmizations may be possible.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.