I am using the following MySQL query in a PHP script on a database that contains over 300,000,000 (yes, three hundred million) rows. I know that it is extremely resource intensive and it takes ages to run this one query. Does anyone know how I can either optimise the query or get the information in another way that's quicker?
I need to be able to use any integer between 1 and 15 in place of the 14 in MID(). I also need to be able to match strings of lengths within the same range in the LIKE
clause.
Table Info:
games | longint, unsigned, Primary Key
win | bit(1)
loss | bit(1)
Example Query:
SELECT MID(`game`,14,1) AS `move`,
COUNT(*) AS `games`,
SUM(`win`) AS `wins`,
SUM(`loss`) AS `losses`
FROM `games`
WHERE `game` LIKE '1112223334%'
GROUP BY MID(`game`,1,14)
Thanks in advance for your help!
First, have an index on the game field... :)
The query seems simple and straightforward, but it hides that fact that a datasbase design change is probably required.
In such cases I always prefer to maintain a field that holds aggregated data, either per day, per user, or per any other axis. This way you can have a daily task that aggregates the relevant data and saves it in the database.
If indeed you call this query often, you should use the principle of decreasing the efficiency of insertion for increasing the efficiency of retrieval.
It looks like the game
column is storing two (or possibly more) different things that this query is using:
game
(first 10 characters) MID(
game ,1,14)
(I'm assuming one of the MID
expressions is a typo. I'd split that up so that you don't have to use string operations on the game
column, and also put indexes on the new columns so you can filter and group them properly.
This query is doing a lot of conversions (long to string) and string manipulations that wouldn't be necessary if the table were normalized (as in one piece of information per column instead of multiple like it is now).
Leave the game
column the way it is, and create a game_filter
string column based on it to use in your WHERE clause. Then set up a game_group
column and populate it with the MID
expression on insert. Set up these two columns as your clustered index, first game_filter
, then game_group
.
The query is simple and, aside from making sure there are all the necessary indexes ("game" field obviously), there may be no obvious way to make it faster by rewriting the query only. Some modification of data structures will probably be necessary.
One way: precalculate the sums. Each of these records will most likely have a create_date or an autoincremented key field. Precalculate the sums for all records, where this field is ≤ some X, put results in a side table, and then you only need to calculate for all records > X, then summarize these partial results with your precalculated ones.
You could precompute the MID( game
,14,1) and MID( game
,1,14) and store the first ten digits of the game
in a separate gameid column which is indexed.
It might also be an idea to investigate if you could just store an aggregate table of the precomputed values so you increment the count and wins or losses column on insert instead.
SELECT MID(`game`,14,1) AS `move`,
COUNT(*) AS `games`,
SUM(`win`) AS `wins`,
SUM(`loss`) AS `losses`
FROM `games`
WHERE `game` LIKE '1112223334%'
Create an index on game
:
CREATE INDEX ix_games_game ON games (game)
and rewrite your query as this:
SELECT move,
(
SELECT COUNT(*)
FROM games
WHERE game >= move
AND game < CONCAT(SUBSTRING(move, 1, 13), CHR(ASCII(SUBSTRING(move, 14, 1)) + 1))
),
(
SELECT SUM(win)
FROM games
WHERE game >= move
AND game < CONCAT(SUBSTRING(move, 1, 13), CHR(ASCII(SUBSTRING(move, 14, 1)) + 1))
),
(
SELECT SUM(lose)
FROM games
WHERE game >= move
AND game < CONCAT(SUBSTRING(move, 1, 13), CHR(ASCII(SUBSTRING(move, 14, 1)) + 1))
)
FROM (
SELECT DISTINCT SUBSTRING(q.game, 1, 14) AS move
FROM games
WHERE game LIKE '1112223334%'
) q
This will help to use the index on game
more efficiently.
Can you cache the result set with Memcache or something similar? That would help with repeated hits. Even if you only cache the result set for a few seconds, you might be able to avoid a lot of DB reads.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.