Optimizing MySQL Query

Question

We have a query that is currently killing our database and I know there has to be a way to optimize it. We have 3 tables:

items - table of items where each items has an associated object_id, length, difficulty_rating, rating, avg_rating & status
lists - table of lists which are basically lists of items created by our users
list_items - table with 2 columns: list_id, item_id

We've been using the following query to display a simple HTML table that shows each list and a number of attributes related to the list including averages of attributes of the included list items:

select object_id, user_id, slug, title, description, items, 
       city, state, country, created, updated,
       (select AVG(rating) from items
          where object_id IN 
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_rating',
       (select AVG(avg_rating) from items
          where object_id IN 
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_avg_rating',
       (select AVG(length) from items 
          where object_id IN 
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_length',
       (select AVG(difficulty_rating) from items 
          where object_id IN
              (select object_id from list_items where list_id=lists.object_id) 
            AND status="A"
       ) as 'avg_difficulty' 
    from lists
    where user_id=$user_id AND status="A" 
    order by $orderby LIMIT $start,$step

The reason why we haven't broken this up in 1 query to get all the lists and subsequent lookups to pull the averages for each list is because we want the user to be able to sort on the averages columns (ie 'order by avg_difficulty').

Hopefully my explanation makes sense. There has to be a much more efficient way to do this and I'm hoping that a MySQL guru out there can point me in the right direction. Thanks!

Answer 1

It looks like you can replace all the subqueries with joins:

SELECT     l.object_id,
           l.user_id,
           <other columns from lists>
           AVG(i.rating) as avgrating,
           AVG(i.avg_rating) as avgavgrating,
           <other averages>
FROM       lists l
LEFT JOIN  list_items li 
ON         li.list_id = l.object_id
LEFT JOIN  items i 
ON         i.object_id = li.object_id
           AND i.status = 'A'
WHERE      l.user_id = $user_id AND l.status = 'A' 
GROUP BY   l.object_id, l.user_id, <other columns from lists>

That would save a lot of work for the DB engine.

Answer 2

Here how to find the bottleneck:

Add the keyword EXPLAIN before the SELECT. This will cause the engine to output how the SELECT was performed.

To learn more about Query Optimization with this method see: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

Answer 3

A couple of things to consider:

Make sure that all of your joins are indexed on both sides. For example, you join list_items.list_id=lists.object_id in several places. list_id and object_id should both have indexes on them.
Have you done any research as to what the variation in the averages are? You might benefit from having a worker thread (or cronjob) calculate the averages periodically rather than putting the load on your RDBMS every time you run this query. You'd need to store the averages in a separate table of course...
Also, are you using status as an enum or a varchar? The cardinality of an enum would be much lower; consider switching to this type if you have a limited range of values for status column.

-aj

Answer 4

That's one hell of a query... you should probably edit your question and change the query so it's a bit more readable, although due to the complex nature of it, I'm not sure that's possible.

Anyway, the simple answer here is to denormalize your database a bit and cache all of your averages on the list table itself in indexed decimal columns. All those sub queries are killing you.

The hard part, and what you'll have to figure out is how to keep those averages updated. A generally easy way is to store the count of all items and the sum of all those values in two separate fields. Anytime an action is made, increment the count by 1, and the sum by whatever. Then update table avg_field = sum_field/count_field.

Answer 5

Besides indexing, even a cursory analysis shows that your query contains much redundancy that your DBMS' optimizer cannot be able to spot (SQL is a redundant language, it admits too many equivalents, syntactically different expressions; this is a known and documented problem - see for example SQL redundancy and DBMS performance , by Fabian Pascal).

I will rewrite your query, below, to highlight that:

let LI =

  select object_id from list_items where list_id=lists.object_id

in

  select object_id, user_id, slug, title, description, items, city, state, country, created, updated,
         (select AVG(rating)            from items where object_id IN LI AND status="A") as 'avg_rating',
         (select AVG(avg_rating)        from items where object_id IN LI AND status="A") as 'avg_avg_rating',
         (select AVG(length)            from items where object_id IN LI AND status="A") as 'avg_length',
         (select AVG(difficulty_rating) from items where object_id IN LI AND status="A") as 'avg_difficulty'
    from lists
   where user_id=$user_id AND status="A"
order by $orderby
   LIMIT $start, $step

Note: this is only the first step to refactor that beast.

I wonder: why people rarely - if at all - use views , even only to simplify SQL queries? It will help in writing more manageable and refactorable queries.

Optimizing MySQL Query

Question

5 answers

solution1
6 ACCPTED 2009-11-07 18:36:30

solution2
1 2009-11-07 18:32:42

solution3
1 2009-11-07 18:36:12

solution4
0 2009-11-07 18:28:53

solution5
0 2009-11-07 19:10:54

Optimizing MySQL Query

Question

5 answers

solution1 6 ACCPTED 2009-11-07 18:36:30

solution2 1 2009-11-07 18:32:42

solution3 1 2009-11-07 18:36:12

solution4 0 2009-11-07 18:28:53

solution5 0 2009-11-07 19:10:54

solution1
6 ACCPTED 2009-11-07 18:36:30

solution2
1 2009-11-07 18:32:42

solution3
1 2009-11-07 18:36:12

solution4
0 2009-11-07 18:28:53

solution5
0 2009-11-07 19:10:54