简体   繁体   中英

Improving performance of mysql query with multiple where conditions on large tables

I have a mysql table with the potential for millions of rows of data - in some extreme cases up to 100mil. There an application I've developed that often queries this data and I've done what I can to optimize it - for the most part it works very quickly because we're only searching a very small subset of the data (tied to locations).

Table structure:

CREATE TABLE `prism_actions` (
  `id` int(11) unsigned NOT NULL auto_increment,
  `action_time` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `action_type` varchar(25) NOT NULL,
  `player` varchar(16) NOT NULL,
  `world` varchar(255) NOT NULL,
  `x` int(11) NOT NULL,
  `y` int(11) NOT NULL,
  `z` int(11) NOT NULL,
  `block_id` mediumint(5) unsigned NOT NULL,
  `block_subid` mediumint(5) unsigned NOT NULL,
  `data` varchar(255) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `x` (`x`),
  KEY `action_type` (`action_type`),
  KEY `player` (`player`),
  KEY `block_id` (`block_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1;

I have several basic indexes for the fields we're using in our WHERE statements most, and when used in queries with only a single condition - it's quite fast.

The example table I'm running these tests on has 22 million records.

Examples:

SELECT prism_actions.id FROM prism_actions WHERE prism_actions.action_type = 'block-break' LIMIT 1000;
1000 rows in set (0.00 sec)

SELECT prism_actions.id FROM prism_actions WHERE prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (0.01 sec)

My problem is that for each condition we use in a query (most queries usually have several conditions), the query takes much longer.

SELECT prism_actions.id FROM prism_actions WHERE prism_actions.action_type = 'block-break' AND prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (0.79 sec)

.79 seconds would be acceptable for the full query but that's just using part of the conditions.

A real query is more like:

SELECT prism_actions.id FROM prism_actions WHERE prism_actions.action_type = 'block-break' AND prism_actions.player = 'viveleroi' AND prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (2.22 sec)

With a single condition we ran with 0.01 , with two we ran with 0.79 and with three we ran at 2.2 seconds which is too long.

I'm going to research how I can better design my indexes but I'm mostly comfortable with the current db schema and indexes.

However, what can I try to make the conditions faster when used together like that?

Update

I spent time converting the tables into a foreign-key format. The player, action_type, and world column data were moved to separate tables, and the IDs for those were stored in the original tables. Took several hours to migrate the data.

However, I'm re-running the same queries I had used earlier and while I see a speed increase in some, I see little change in others.

The converted version of the .79 second query above runs about the same speed:

SELECT prism_actions.id FROM prism_actions WHERE prism_actions.actiontype_id = 1 AND prism_actions.block_id = 2 LIMIT 1000;
1000 rows in set (0.73 sec)

The block_id col still has an index from the original table schema.

Queries with the player_id as a condition were running very slowly so I added an index to the column and now the queries are blazing fast.

However, after taking examples of several queries from a real user, and updating them for this table structure, I see no change in speeds.

SELECT prism_actions.id FROM prism_actions WHERE (prism_actions.actiontype_id = 2 OR prism_actions.actiontype_id = 1) AND (prism_actions.player_id = 1127) AND prism_actions.action_time >= '2013-02-22 07:47:54' LIMIT 1000;

Previously took 5.83 sec , currently takes 5.29 sec

Edit - seems to be the timestamp. Taking the timestamp condition out of the above query returns the results in 0.01 seconds. Adding an index for the timestamp does nothing - ideas?

So far, all I'm really seeing as is a slight speed increase in certain areas, a small file space savings since we store duplicate strings - but nothing so far that would warrant asking hundreds of users with databases this large to spend a day's worth of time converting data.

Any suggestions for some other way I might index stuff, etc?

move all the text columns (action type, player, world ) all these are text columns to a new table.

this will reduce the db size and keep the refernce numbers in this table.

This will dramatically improve performance.

MySQL v5.5: you can create PARTITION BY RANGE COLUMNS as:

CREATE TABLE `prism_actions` (
  `id` int(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  `action_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `action_type` varchar(25) NOT NULL,
  `player` varchar(16) NOT NULL,
  `world` varchar(255) NOT NULL,
  `x` int(11) NOT NULL,
  `y` int(11) NOT NULL,
  `z` int(11) NOT NULL,
  `block_id` mediumint(5) UNSIGNED NOT NULL,
  `block_subid` mediumint(5) UNSIGNED NOT NULL,
  `data` varchar(255) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `x` (`x`),
  KEY `action_type` (`action_type`),
  KEY `player` (`player`),
  KEY `block_id` (`block_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1
PARTITION BY RANGE COLUMNS(action_type, player, block_id)(
PARTITION p0 VALUES LESS THAN ('dddddddd','dddddddd',1000000),
PARTITION p1 VALUES LESS THAN ('gggggggg','gggggggg',2000000),
PARTITION p2 VALUES LESS THAN ('jjjjjjjj','jjjjjjjj',3000000),
PARTITION p3 VALUES LESS THAN ('mmmmmmmm','mmmmmmmm',4000000),
PARTITION p4 VALUES LESS THAN ('pppppppp','pppppppp',5000000),
PARTITION p5 VALUES LESS THAN ('ssssssss','ssssssss',6000000),
PARTITION p6 VALUES LESS THAN ('uuuuuuuu','uuuuuuuu',7000000),
PARTITION p7 VALUES LESS THAN (MAXVALUE,MAXVALUE,MAXVALUE)
);

Worst case is for any given set of (action_type, player, block_id) is, they can belong to one partition only. Hence, it's better in comparison to original query.

Bonus speed, if you can analyse frequency distribution of column values and make partitions accordingly. Above partitions are rough intervals.

I would leave your table alone to prevent needing to join again after your result set. You just need a single index with all the key columns of the where, not each one individually. I would try to optimize this based on the smallest result set you would hit first, such as 22 million records, I bet quite a few based on Block_ID = 2 and much less based on player.

So, I would have an index on

create index multipart on prism_actions ( Player, Block_ID, Action_Type );

AS A SINGLE INDEX, not individual fields as you currently have. This allows the engine to jump directly to a given player, now from 22 million, down to say 2000 entries, to block ID = 2 now down to 200, down to action_type = block break.... 20 records ... obviously just arbitrary samples of record counts, but the compound index should be all you need.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM