简体   繁体   中英

Optimizing MySQL Query

We have a query that is currently killing our database and I know there has to be a way to optimize it. We have 3 tables:

  1. items - table of items where each items has an associated object_id, length, difficulty_rating, rating, avg_rating & status
  2. lists - table of lists which are basically lists of items created by our users
  3. list_items - table with 2 columns: list_id, item_id

We've been using the following query to display a simple HTML table that shows each list and a number of attributes related to the list including averages of attributes of the included list items:

select object_id, user_id, slug, title, description, items, 
       city, state, country, created, updated,
       (select AVG(rating) from items
          where object_id IN 
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_rating',
       (select AVG(avg_rating) from items
          where object_id IN 
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_avg_rating',
       (select AVG(length) from items 
          where object_id IN 
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_length',
       (select AVG(difficulty_rating) from items 
          where object_id IN
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_difficulty' 
    from lists
    where user_id=$user_id AND status="A" 
    order by $orderby LIMIT $start,$step

The reason why we haven't broken this up in 1 query to get all the lists and subsequent lookups to pull the averages for each list is because we want the user to be able to sort on the averages columns (ie 'order by avg_difficulty').

Hopefully my explanation makes sense. There has to be a much more efficient way to do this and I'm hoping that a MySQL guru out there can point me in the right direction. Thanks!

It looks like you can replace all the subqueries with joins:

SELECT     l.object_id,
           l.user_id,
           <other columns from lists>
           AVG(i.rating) as avgrating,
           AVG(i.avg_rating) as avgavgrating,
           <other averages>
FROM       lists l
LEFT JOIN  list_items li 
ON         li.list_id = l.object_id
LEFT JOIN  items i 
ON         i.object_id = li.object_id
           AND i.status = 'A'
WHERE      l.user_id = $user_id AND l.status = 'A' 
GROUP BY   l.object_id, l.user_id, <other columns from lists>

That would save a lot of work for the DB engine.

Here how to find the bottleneck:

Add the keyword EXPLAIN before the SELECT. This will cause the engine to output how the SELECT was performed.

To learn more about Query Optimization with this method see: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

A couple of things to consider:

  1. Make sure that all of your joins are indexed on both sides. For example, you join list_items.list_id=lists.object_id in several places. list_id and object_id should both have indexes on them.

  2. Have you done any research as to what the variation in the averages are? You might benefit from having a worker thread (or cronjob) calculate the averages periodically rather than putting the load on your RDBMS every time you run this query. You'd need to store the averages in a separate table of course...

  3. Also, are you using status as an enum or a varchar? The cardinality of an enum would be much lower; consider switching to this type if you have a limited range of values for status column.

-aj

That's one hell of a query... you should probably edit your question and change the query so it's a bit more readable, although due to the complex nature of it, I'm not sure that's possible.

Anyway, the simple answer here is to denormalize your database a bit and cache all of your averages on the list table itself in indexed decimal columns. All those sub queries are killing you.

The hard part, and what you'll have to figure out is how to keep those averages updated. A generally easy way is to store the count of all items and the sum of all those values in two separate fields. Anytime an action is made, increment the count by 1, and the sum by whatever. Then update table avg_field = sum_field/count_field.

Besides indexing, even a cursory analysis shows that your query contains much redundancy that your DBMS' optimizer cannot be able to spot (SQL is a redundant language, it admits too many equivalents, syntactically different expressions; this is a known and documented problem - see for example SQL redundancy and DBMS performance , by Fabian Pascal).

I will rewrite your query, below, to highlight that:

let LI =

  select object_id from list_items where list_id=lists.object_id

in

  select object_id, user_id, slug, title, description, items, city, state, country, created, updated,
         (select AVG(rating)            from items where object_id IN LI AND status="A") as 'avg_rating',
         (select AVG(avg_rating)        from items where object_id IN LI AND status="A") as 'avg_avg_rating',
         (select AVG(length)            from items where object_id IN LI AND status="A") as 'avg_length',
         (select AVG(difficulty_rating) from items where object_id IN LI AND status="A") as 'avg_difficulty'
    from lists
   where user_id=$user_id AND status="A"
order by $orderby
   LIMIT $start, $step

Note: this is only the first step to refactor that beast.

I wonder: why people rarely - if at all - use views , even only to simplify SQL queries? It will help in writing more manageable and refactorable queries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM