简体   繁体   中英

Why does a MySQL query with a Dependent Subquery take so much longer to execute than executing each statement individually?

I'm running two queries.

The first one gets unique IDs. This executes in ~350ms.

select parent_id
from duns_match_sealed_air_072815
group by duns_number

Then I paste those IDs into this second query. With >10k ids pasted in, it also executes in about ~350ms.

select term, count(*) as count
from companies, business_types, business_types_to_companies
where
    business_types.id = business_types_to_companies.term_id
    and companies.id = business_types_to_companies.company_id
    and raw_score > 25
    and diversity = 1
    and company_id in (paste,ten,thousand,ids,here)
group by term
order by count desc;

When I combine these queries into one it takes a long time to execute. I don't know how long because I stopped it after minutes.

select term, count(*) as count
from companies, business_types, business_types_to_companies
where
    business_types.id = business_types_to_companies.term_id
    and companies.id = business_types_to_companies.company_id
    and raw_score > 25
    and diversity = 1
    and company_id in (
        select parent_id
        from duns_match_sealed_air_072815
        group by duns_number
    )
group by term
order by count desc;

What is going on?

It's down to the way it processes the query - I believe it has to run your embedded query once for each row, whereas using two queries allows you to store the result.

Hope this helps!

The query has been re-written using JOIN , but particularly I've used EXISTS instead of IN . This is a short in the dark. It is possible that there may be many values generated in the sub-query causing the outer query to struggle while it goes through matching each item returned from the sub-query.

select term, count(*) as count
from companies c
inner join business_types_to_companies bc on bc.company_id = c.id
inner join business_types b on b.id = bc.term_id
where 
    raw_score > 25
    and diversity = 1
    and exists (
        select 1 
        from duns_match_sealed_air_072815
        where parent_id = c.id
    )
group by term
order by count desc;

First, with respect, your subquery doesn't use GROUP BY in a sensible way.

select parent_id         /* wrong GROUP BY */
  from duns_match_sealed_air_072815
 group by duns_number

In fact, it misuses the pernicious MySQL extension to GROUP BY . Read this. http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html . I can't tell what your application logic intends from this query, but I can tell you that it actually returns an unpredictably selected parent_id value associated with each distinct duns_number value.

Do you want

select MIN(parent_id) parent_id  
  from duns_match_sealed_air_072815
 group by duns_number

or something like that? That one selects the lowest parent ID associated with each given number.

Sometimes MySQL has a hard time optimizing the WHERE .... IN () query pattern. Try a join instead. Like this:

select term, count(*) as count
  from companies
  join (
          select MIN(parent_id) parent_id
            from duns_match_sealed_air_072815
           group by duns_number
       ) idlist ON companies.id = idlist.parent_id
  join business_types_to_companies ON companies.id = business_types_to_companies.company_id
  join business_types ON business_types.id = business_types_to_companies.term_id
 where raw_score > 25
   and diversity = 1
 group by term 
 order by count desc

To optimize this further we'll need to see the table definitions and the output from EXPLAIN .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM