Optimizing a mysql index on an ordered, range query

Question

I'm looking for help with some offensive mysql queries I currently have running against my server. My goal is to show the most expensive ebay items with end time's less than one month ago.

I'm using MySQL 5.1.

My query is as follows ('ebay_items' has ~350,000 rows):

explain SELECT `ebay_items`.* FROM `ebay_items` 
WHERE (endtime > NOW()-INTERVAL 1 MONTH) ORDER BY price desc\G;

yields:

*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ebay_items
type: range
possible_keys: endtime
key: endtime
key_len: 9
ref: NULL
rows: 71760
Extra: Using where; Using filesort
1 row in set (0.00 sec)

This query results in an expensive 'filesort' using 71760 rows.

show indexes on ebay_items;

yields (I've only included the index in question, 'endtime'):

*************************** 7. row ***************************
Table: ebay_items
Non_unique: 1
Key_name: endtime
Seq_in_index: 1
Column_name: endtime
Collation: A
Cardinality: 230697
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment: 
*************************** 8. row ***************************
Table: ebay_items
Non_unique: 1
Key_name: endtime
Seq_in_index: 2
Column_name: price
Collation: A
Cardinality: 230697
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:

Only the 'endtime' key of the composite endtime index (endtime, price) is being used. As far as I know, MySQL will not make use of a composite index effectively when dealing with a range query in conjunction with an 'order by' clause.

Has anyone found a good workout for this issues? I'd primarily like to solve it at the DB level (either by a smarter use of indexes or schema changes) but I'm open to suggestions.

One way that I could avoid the range query is to have a background task cycling through every X hours and marking an enum type field on ebay_items as '< 1 day old', '< 1 week old', '< 1 month old', etc. I was hoping to solve the problem in a cleaner way.

Is there any way to execute MySQL range query with an order by clause, queries in an efficient manner?

Huge thanks for your help!

Edit: Kohányi Róbert made the good point that I should clarify the exact problem I was having with the query. The query results in the disk I/O being pegged for it's duration. If several of these queries are running simultaneously, processes get backed up and the machine locks up. My assumption is that the filesort is eating the I/O.

I should also mention that the table is using the MyISAM engine. Would it more performant and less I/O intensive to use the InnoDB engine? Thanks again.

Answer 1

Introduction

I like your question so I've played a bit with MySQL and tried to find the source of the problem. For that, I've created some tests.

Data

I've generated 100.000 rows of sample data using a tool called Random Data Generator (the documentation is a bit out-dated I think, but it works). The configuration file I've passed to gendata.pl is as follows.

$tables = {
  rows => [100000],
  names => ['ebay_items'],
  engines => ['MyISAM'],
  pk => ['int auto_increment']
};

$fields = {
  types => ['datetime', 'int'],
  indexes => [undef]
};

$data = {
  numbers => [
    'tinyint unsigned', 
    'smallint unsigned', 
    'smallint unsigned',
    'mediumint unsigned'
  ],
  temporals => ['datetime']
};

I've ran two separate batch of tests: one that used a MyISAM table, and another that used InnoDB. (So basically you replace MyISAM with InnoDB in the above snippet.)

Table

The tool creates a table where the columns are called pk , col_datetime and col_int . I've renamed them to match your table's columns. The resulting table is just below.

+---------+----------+------+-----+---------+----------------+
| Field   | Type     | Null | Key | Default | Extra          |
+---------+----------+------+-----+---------+----------------+
| endtime | datetime | YES  | MUL | NULL    |                |
| id      | int(11)  | NO   | PRI | NULL    | auto_increment |
| price   | int(11)  | YES  | MUL | NULL    |                |
+---------+----------+------+-----+---------+----------------+

Indices

The tool creates no indices, because I'd liked it to create them by hand.

CREATE INDEX `endtime` ON `ebay_items` (endtime, price);
CREATE INDEX `price` ON `ebay_items` (price, endtime);
CREATE INDEX `endtime_only` ON `ebay_items` (endtime);
CREATE INDEX `price_only` ON `ebay_items` (price);

Query

The query I've used.

SELECT `ebay_items`.* 
FROM `ebay_items`  
FORCE INDEX (`endtime|price|endtime_only|price_only`)
WHERE (`endtime` > '2009-01-01' - INTERVAL 1 MONTH) 
ORDER BY `price` DESC

_{(Four different query using one of the indices. I've used 2009-01-01 instead of NOW() because the tool seems to generate dates around 2009.)}

Explain

Here is the output of EXPLAIN for the query above for each indices on a MyISAM (top) and an InnoDB (bottom) table.

endtime

           id: 1
  select_type: SIMPLE
        table: ebay_items
         type: range
possible_keys: endtime
          key: endtime
      key_len: 9
          ref: NULL
         rows: 25261
        Extra: Using where; Using filesort

           id: 1
  select_type: SIMPLE
        table: ebay_items
         type: range
possible_keys: endtime
          key: endtime
      key_len: 9
          ref: NULL
         rows: 21026
        Extra: Using where; Using index; Using filesort

price

           id: 1
  select_type: SIMPLE
        table: ebay_items
         type: index
possible_keys: NULL
          key: price
      key_len: 14
          ref: NULL
         rows: 100000
        Extra: Using where

         id: 1
  select_type: SIMPLE
        table: ebay_items
         type: index
possible_keys: NULL
          key: price
      key_len: 14
          ref: NULL
         rows: 100226
        Extra: Using where; Using index

endtime_only

           id: 1
  select_type: SIMPLE
        table: ebay_items
         type: range
possible_keys: endtime_only
          key: endtime_only
      key_len: 9
          ref: NULL
         rows: 11666
        Extra: Using where; Using filesort

          id: 1
  select_type: SIMPLE
        table: ebay_items
         type: range
possible_keys: endtime_only
          key: endtime_only
      key_len: 9
          ref: NULL
         rows: 21270
        Extra: Using where; Using filesort

price_only

           id: 1
  select_type: SIMPLE
        table: ebay_items
         type: index
possible_keys: NULL
          key: price_only
      key_len: 5
          ref: NULL
         rows: 100000
        Extra: Using where

           id: 1
  select_type: SIMPLE
        table: ebay_items
         type: index
possible_keys: NULL
          key: price_only
      key_len: 5
          ref: NULL
         rows: 100226
        Extra: Using where

Based on these I've decided to use the endtime_only index for my tests, because I had to run queries against a MyISAM and an InnoDB table too. But as you can see the most logical endtime index seems to be the best.

Test

For testing the efficiency of the query (regarding the generated I/O activity) with a MyISAM and InnoDB table I've written the following simple Java program.

static final String J = "jdbc:mysql://127.0.0.1:3306/test?user=root&password=root";
static final String Q = "SELECT * FROM ebay_items FORCE INDEX (endtime_only) WHERE (endtime > '2009-01-01'-INTERVAL 1 MONTH) ORDER BY price desc;";

public static void main(String[] args) throws InterruptedException {
  for (int i = 0; i < 1000; i++)
    try (Connection c = DriverManager.getConnection(J);
        Statement s = c.createStatement()) {
      TimeUnit.MILLISECONDS.sleep(10L);
      s.execute(Q);
    } catch (SQLException ex) {
      ex.printStackTrace();
    }
}

Setup

I was running the Windows binary of MySQL 5.5 on Dell Vostro 1015 laptop, Intel Core Duo T6670 @ 2.20 GHz, 4 GB RAM. The Java program was communicating with the MySQL server process via TCP/IP.

State

I've captured the state of the mysqld process before and after running my tests against the table using MyISAM and InnoDB (using Process Explorer ).

Before

mysqld性能选项卡

mysqld磁盘和网络选项卡

After— MyISAM

mysqld性能选项卡/ MyISAM

mysqld磁盘和网络选项卡/ MyISAM

After— InnoDB

mysqld性能选项卡/ InnoDB

mysqld磁盘和网络选项卡/ InnoDB

Conclusion

Basically the two runs differ only in the number of individual I/O reads, which is quite large when the table used the MyISAM engine. The two tests ran for 50–60 seconds both. The CPU's maximum load in case of the MyISAM engine was around 42 percent while using InnoDB it was around 38.

I'm not quite sure what are the implication of the high number of I/O reads, but in this case smaller is better (probably). If you have some more columns in your table (other than the one you've specified) and have some non-default MySQL configuration (regarding buffer sizes and such), it's possible that MySQL would use disk resources.

Optimizing a mysql index on an ordered, range query

Question

1 answers

solution1
8 2011-12-04 10:34:21

Introduction

Data

Table

Indices

Query

Explain

endtime

price

endtime_only

price_only

Test

Setup

State

Before

After— MyISAM

After— InnoDB

Conclusion

Optimizing a mysql index on an ordered, range query

Question

1 answers

solution1 8 2011-12-04 10:34:21

Introduction

Data

Table

Indices

Query

Explain

endtime

price

endtime_only

price_only

Test

Setup

State

Before

After— MyISAM

After— InnoDB

Conclusion

solution1
8 2011-12-04 10:34:21