Index a query “WHERE a IN (1,2,3) AND b = 4”

Question

I am attempting to apply an index that will speed up one of the slowest queries in my application:

SELECT * FROM orders WHERE product_id IN (1, 2, 3, 4) AND user_id = 5678;

I have an index on product_id , user_id , and the pair (product_id, user_id) . However, the server does not use any of these indexes:

+----+-------------+------- +------+-------------------------------------------------------------------------------------------+------+---------+------+------+-------------+
| id | select_type | table  | type | possible_keys                                                                             | key  | key_len | ref  | rows | Extra       |
+----+-------------+--------+------+-------------------------------------------------------------------------------------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | orders | ALL  | index_orders_on_product_id,index_orders_on_user_id,index_orders_on_product_id_and_user_id | NULL | NULL    | NULL |    6 | Using where |
+----+-------------+--------+------+-------------------------------------------------------------------------------------------+------+---------+------+------+-------------+

(There are only 6 rows on development, so whatever, but on production there are about 400k rows, so execution takes about 0.25s, and this query is fired pretty darn often.)

How can I avoid a simple WHERE here? I suppose I could send a query for each product_id , which would likely be faster than this version, but the number of products could be very high, so if it's doable in one query that would be significantly preferable. This query is generated by Rails, so I'm a bit limited in how much I can restructure the query itself. Thanks!

Answer 1

For optimal performance of this particular query on your production table (with 400k rows), you need a composite index on {user_id, product_id} , in that order .

Ideally, this would be the only index, and you would use InnoDB so the table is clustered . Every additional index incurs a penalty when modifying data, and on top of that secondary indexes in clustered tables are even more expensive than secondary indexes in heap-based tables.

To understand why user_id (and not product_id ) should be at the leading edge of the index, please take a look at the the Anatomy of an Index . Essentially, since WHERE searches for only one user_id , putting it first clusters the related product_id values closer in the index.

(The {product_id, user_id} would also work, but would "scatter" the "target" index nodes less favorably.)

Answer 2

When there are so little rows on the database, it does not use indexes, because it's cheaper to do a full scan. Try checking the data on your prod environment and see if it uses one of your indexes.

Also, note that you can eliminate one of your indexes, index_by_product_id, because you already have another index that starts with product_id field.

Index a query “WHERE a IN (1,2,3) AND b = 4”

Question

2 answers

solution1
5 2012-03-21 23:18:22

solution2
4 ACCPTED 2012-03-21 22:42:33

Index a query “WHERE a IN (1,2,3) AND b = 4”

Question

2 answers

solution1 5 2012-03-21 23:18:22

solution2 4 ACCPTED 2012-03-21 22:42:33

solution1
5 2012-03-21 23:18:22

solution2
4 ACCPTED 2012-03-21 22:42:33