I'm looking for help with some offensive mysql queries I currently have running against my server. My goal is to show the most expensive ebay items with end time's less than one month ago.
I'm using MySQL 5.1.
My query is as follows ('ebay_items' has ~350,000 rows):
explain SELECT `ebay_items`.* FROM `ebay_items`
WHERE (endtime > NOW()-INTERVAL 1 MONTH) ORDER BY price desc\G;
yields:
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ebay_items
type: range
possible_keys: endtime
key: endtime
key_len: 9
ref: NULL
rows: 71760
Extra: Using where; Using filesort
1 row in set (0.00 sec)
This query results in an expensive 'filesort' using 71760 rows.
show indexes on ebay_items;
yields (I've only included the index in question, 'endtime'):
*************************** 7. row ***************************
Table: ebay_items
Non_unique: 1
Key_name: endtime
Seq_in_index: 1
Column_name: endtime
Collation: A
Cardinality: 230697
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
*************************** 8. row ***************************
Table: ebay_items
Non_unique: 1
Key_name: endtime
Seq_in_index: 2
Column_name: price
Collation: A
Cardinality: 230697
Sub_part: NULL
Packed: NULL
Null: YES
Index_type: BTREE
Comment:
Only the 'endtime' key of the composite endtime index (endtime, price) is being used. As far as I know, MySQL will not make use of a composite index effectively when dealing with a range query in conjunction with an 'order by' clause.
Has anyone found a good workout for this issues? I'd primarily like to solve it at the DB level (either by a smarter use of indexes or schema changes) but I'm open to suggestions.
One way that I could avoid the range query is to have a background task cycling through every X hours and marking an enum type field on ebay_items as '< 1 day old', '< 1 week old', '< 1 month old', etc. I was hoping to solve the problem in a cleaner way.
Is there any way to execute MySQL range query with an order by clause, queries in an efficient manner?
Huge thanks for your help!
Edit: Kohányi Róbert made the good point that I should clarify the exact problem I was having with the query. The query results in the disk I/O being pegged for it's duration. If several of these queries are running simultaneously, processes get backed up and the machine locks up. My assumption is that the filesort is eating the I/O.
I should also mention that the table is using the MyISAM engine. Would it more performant and less I/O intensive to use the InnoDB engine? Thanks again.
I like your question so I've played a bit with MySQL and tried to find the source of the problem. For that, I've created some tests.
I've generated 100.000 rows of sample data using a tool called Random Data Generator (the documentation is a bit out-dated I think, but it works). The configuration file I've passed to gendata.pl
is as follows.
$tables = {
rows => [100000],
names => ['ebay_items'],
engines => ['MyISAM'],
pk => ['int auto_increment']
};
$fields = {
types => ['datetime', 'int'],
indexes => [undef]
};
$data = {
numbers => [
'tinyint unsigned',
'smallint unsigned',
'smallint unsigned',
'mediumint unsigned'
],
temporals => ['datetime']
};
I've ran two separate batch of tests: one that used a MyISAM table, and another that used InnoDB. (So basically you replace MyISAM with InnoDB in the above snippet.)
The tool creates a table where the columns are called pk
, col_datetime
and col_int
. I've renamed them to match your table's columns. The resulting table is just below.
+---------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+---------+----------------+
| endtime | datetime | YES | MUL | NULL | |
| id | int(11) | NO | PRI | NULL | auto_increment |
| price | int(11) | YES | MUL | NULL | |
+---------+----------+------+-----+---------+----------------+
The tool creates no indices, because I'd liked it to create them by hand.
CREATE INDEX `endtime` ON `ebay_items` (endtime, price);
CREATE INDEX `price` ON `ebay_items` (price, endtime);
CREATE INDEX `endtime_only` ON `ebay_items` (endtime);
CREATE INDEX `price_only` ON `ebay_items` (price);
The query I've used.
SELECT `ebay_items`.*
FROM `ebay_items`
FORCE INDEX (`endtime|price|endtime_only|price_only`)
WHERE (`endtime` > '2009-01-01' - INTERVAL 1 MONTH)
ORDER BY `price` DESC
(Four different query using one of the indices. I've used 2009-01-01
instead of NOW()
because the tool seems to generate dates around 2009.)
Here is the output of EXPLAIN
for the query above for each indices on a MyISAM (top) and an InnoDB (bottom) table.
id: 1
select_type: SIMPLE
table: ebay_items
type: range
possible_keys: endtime
key: endtime
key_len: 9
ref: NULL
rows: 25261
Extra: Using where; Using filesort
id: 1
select_type: SIMPLE
table: ebay_items
type: range
possible_keys: endtime
key: endtime
key_len: 9
ref: NULL
rows: 21026
Extra: Using where; Using index; Using filesort
id: 1
select_type: SIMPLE
table: ebay_items
type: index
possible_keys: NULL
key: price
key_len: 14
ref: NULL
rows: 100000
Extra: Using where
id: 1
select_type: SIMPLE
table: ebay_items
type: index
possible_keys: NULL
key: price
key_len: 14
ref: NULL
rows: 100226
Extra: Using where; Using index
id: 1
select_type: SIMPLE
table: ebay_items
type: range
possible_keys: endtime_only
key: endtime_only
key_len: 9
ref: NULL
rows: 11666
Extra: Using where; Using filesort
id: 1
select_type: SIMPLE
table: ebay_items
type: range
possible_keys: endtime_only
key: endtime_only
key_len: 9
ref: NULL
rows: 21270
Extra: Using where; Using filesort
id: 1
select_type: SIMPLE
table: ebay_items
type: index
possible_keys: NULL
key: price_only
key_len: 5
ref: NULL
rows: 100000
Extra: Using where
id: 1
select_type: SIMPLE
table: ebay_items
type: index
possible_keys: NULL
key: price_only
key_len: 5
ref: NULL
rows: 100226
Extra: Using where
Based on these I've decided to use the endtime_only
index for my tests, because I had to run queries against a MyISAM and an InnoDB table too. But as you can see the most logical endtime
index seems to be the best.
For testing the efficiency of the query (regarding the generated I/O activity) with a MyISAM and InnoDB table I've written the following simple Java program.
static final String J = "jdbc:mysql://127.0.0.1:3306/test?user=root&password=root";
static final String Q = "SELECT * FROM ebay_items FORCE INDEX (endtime_only) WHERE (endtime > '2009-01-01'-INTERVAL 1 MONTH) ORDER BY price desc;";
public static void main(String[] args) throws InterruptedException {
for (int i = 0; i < 1000; i++)
try (Connection c = DriverManager.getConnection(J);
Statement s = c.createStatement()) {
TimeUnit.MILLISECONDS.sleep(10L);
s.execute(Q);
} catch (SQLException ex) {
ex.printStackTrace();
}
}
I was running the Windows binary of MySQL 5.5 on Dell Vostro 1015 laptop, Intel Core Duo T6670 @ 2.20 GHz, 4 GB RAM. The Java program was communicating with the MySQL server process via TCP/IP.
I've captured the state of the mysqld
process before and after running my tests against the table using MyISAM and InnoDB (using Process Explorer ).
Basically the two runs differ only in the number of individual I/O reads, which is quite large when the table used the MyISAM engine. The two tests ran for 50–60 seconds both. The CPU's maximum load in case of the MyISAM engine was around 42 percent while using InnoDB it was around 38.
I'm not quite sure what are the implication of the high number of I/O reads, but in this case smaller is better (probably). If you have some more columns in your table (other than the one you've specified) and have some non-default MySQL configuration (regarding buffer sizes and such), it's possible that MySQL would use disk resources.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.