简体   繁体   English

MySQL查询语句索引调优

[英]Mysql query statement index tuning

I'm working on how to implement a global leaderboard for a Facebook racing game my company has released. 我正在研究如何为我公司发布的Facebook赛车游戏实现全球排行榜。 What I'd like to do is be able to store the player's userID and their time for a race. 我想做的就是能够存储玩家的用户ID和他们的比赛时间。 I've got a table like the one below: 我有一张像下面的桌子:

+--------+-----------------------+------+-----+---------+-------+
| Field  | Type                  | Null | Key | Default | Extra |
+--------+-----------------------+------+-----+---------+-------+
| userID | mediumint(8) unsigned | NO   | PRI | 0       |       |
| time   | time                  | YES  | MUL | NULL    |       |
+--------+-----------------------+------+-----+---------+-------+

And a sample set of data like so: 样例数据如下:

+--------+----------+
| userID | time     |
+--------+----------+
| 505610 | 10:10:10 |
| 544222 | 10:10:10 |
| 547278 | 10:10:10 |
| 659241 | 10:10:10 |
| 681087 | 10:10:10 |
+--------+----------+

My queries will be coming from PHP. 我的查询将来自PHP。 Now if I assume that I've got unlimited resources, what I could do is this: 现在,如果我假设我有无限的资源,我可以做的是:

$q1 = "Set @rank := 0";
$q2 = "select @rank:=@rank+1 as rank,userID,time from highscore order by time asc where userID=$someUserID";
$q3 = "Set @rank := 0";
$q4 = "select @rank:=@rank+1 as rank,userID,time from highscore order by time asc where rank > $rankFromSecondQuery - 10 and rank < $rankFromSecondQuery + 10";

But I don't have unlimited resources and I have to be able to scale this to support millions of players since it's going into a social game on Facebook. 但是我没有无限的资源,而且由于要在Facebook上进行社交游戏,我必须能够扩展它以支持数百万的玩家。 So after spending a few days crawling all over Google, I've been able to get my queries down to this: 因此,花了几天的时间在整个Google上进行爬网之后,我就能够将查询简化为:

$q5 = "select rank,userID,time from (select @rank:=0) r, (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where userID=$someUserID"
$q6 = "select rank,userID,time from (select @rank:=0) r, (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where rank > $rankFromFirstQuery - 10 and rank < $rankFromSecondQuery + 10";

This works, but it isn't very pretty with an average runtime per query of approx 2.3 seconds. 这可行,但是每次查询的平均运行时间约为2.3秒,效果不是很好。

EDIT: Here's what the $q5 and $q6 give me when run them: 编辑:这是$ q5和$ q6在运行它们时给我的:

mysql> select rank,userID,time from (select @rank:=0) r, (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where userID=11345;                                                                          
+--------+--------+----------+
| rank   | userID | time     |
+--------+--------+----------+
| 423105 |  11345 | 12:47:23 |
+--------+--------+----------+
1 row in set (2.42 sec)

mysql> select rank,userID,time from (select @rank:=0) r, (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where rank>423100 and rank<423110;
+--------+---------+----------+
| rank   | userID  | time     |
+--------+---------+----------+
| 423101 | 2416665 | 12:47:22 |
| 423102 | 2419720 | 12:47:22 |
| 423103 | 2426606 | 12:47:22 |
| 423104 | 2488517 | 12:47:22 |
| 423105 |   11345 | 12:47:23 |
| 423106 |   92350 | 12:47:23 |
| 423107 |   94277 | 12:47:23 |
| 423108 |  114685 | 12:47:23 |
| 423109 |  135434 | 12:47:23 |
+--------+---------+----------+
9 rows in set (2.58 sec)

Here's the explain extended block $q5 and the one for $q6 looks just about identical: 这是解释性扩展块$ q5,而$ q6的扩展块几乎相同:

mysql> explain select rank,userID,time from (select @rank:=0) r, (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where userID=11345;
+----+-------------+------------+--------+---------------+----------+---------+------+---------+----------------+
| id | select_type | table      | type   | possible_keys | key      | key_len | ref  | rows    | Extra          |
+----+-------------+------------+--------+---------------+----------+---------+------+---------+----------------+
|  1 | PRIMARY     | <derived2> | system | NULL          | NULL     | NULL    | NULL |       1 |                |
|  1 | PRIMARY     | <derived3> | ALL    | NULL          | NULL     | NULL    | NULL | 2500000 | Using where    |
|  3 | DERIVED     | highscore  | index  | NULL          | idx_time | 4       | NULL | 2500842 | Using index    |
|  2 | DERIVED     | NULL       | NULL   | NULL          | NULL     | NULL    | NULL |    NULL | No tables used |
+----+-------------+------------+--------+---------------+----------+---------+------+---------+----------------+

So ultimately, what I'd really like to be able to do is get this down into just one query such that I can temper the execution time with a high CPU server or two. 因此,最终,我真正想做的就是将其归为一个查询,这样我就可以用一两个高CPU服务器来调整执行时间。 Either that or I'd like to figure out a way to just hit an index on the part of the query that's associated with the derive3 line in the explain block that's hitting on all the rows in the table. 要么,要么我想找出一种方法,使命中与表中所有行都命中的explain块中的derivate3行关联的查询部分的索引。

Here's a couple of the queries that I've tried without any success so far: 到目前为止,我尝试了以下两个未成功的查询:

select rank,userID,time from (select @rank:=0) r, (select @playerRank := rank from (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where userID=11345) as myFoo where @playerRank>423100 and @playerRank<423110;
select rank,userID,time from (select @playerRank := rank from (select @rank := 0) r, (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where userID=11345) as myFoo where @playerRank>423100 and @playerRank<423110;
select * from (select @rank:=0) r, (select @playerRank := userID from (select @rank:=@rank+1 as rank,userID,time from highscore order by time asc) as myMine where userID=11345) as myFoo where @playerRank>423100 and @playerRank<423110;

The first two game me a "ERROR 1054 (42S22): Unknown colum 'rank' in 'field list' error and the third just returned an empty set instead of the data I was looking for. 前两个游戏是“错误1054(42S22):“字段列表”中的未知列“排名”错误,而第三个游戏返回的是空集而不是我要查找的数据。

Anyone have any ideas how to either get my two queries listed above to hit on an index so the execution time decreases or how to combine the two queries into one so I only have to suffer with a painful execution time once? 任何人都有任何想法如何使我上面列出的两个查询命中一个索引,从而减少执行时间,或者如何将两个查询合并为一个,这样我只需要经历一次痛苦的执行时间? I'd also be open to tuning/optimizations like tweaking MySQL config settings and/or using something like Percona if anyone has experience using something like that and would like to share their experiences. 如果有人有使用类似的东西并想分享他们的经验的话,我也愿意进行调整/优化,例如调整MySQL配置设置和/或使用类似Percona的东西。

After runnint $q5 you shold know the rank of the user, after that you should be able to use a limit to get the right rows 在runnint $q5之后,您可以知道用户的等级,之后您应该可以使用限制来获取正确的行

$lowest_rank_to_fetch = max(0, $rankFromFirstQuery - 10);
$q6l = "SELECT userID, time
        FROM highscore
        ORDER BY time ASC
        LIMIT {$lowest_rank_to_fetch}, 21";

/* some execute query function */

foreach(range($lowest_rank_to_fetch, $lowest_rank_to_fetch+21) as $current_rank)
{
   /* some database fetch function */
   /* add $current_rank to result */
}

You can obtain the rank first using count() this should perform a bit better for the first query: 您可以使用count()首先获得排名,这对于第一个查询应该会表现得更好:

SELECT COUNT(h.userID) as rank, h2.userID, h2.time
   FROM highscore h
   LEFT OUTER JOIN highscore h2 ON (h.time <= h2.time)
   WHERE h2.userID = ?

Then you could use Puggan's technique for querying the nearby rankings. 然后,您可以使用Puggan的技术来查询附近的排名。

SELECT ... ORDER BY time LIMIT $lowest_rank, 21

I'd like to propose this alternative solution to get at what you're trying to achieve. 我想提出这种替代解决方案,以实现您要实现的目标。

Make a separate table to store the rank. 制作一个单独的表来存储等级。 Don't compute it every time a user wants to know his/her rank, and don't include it in the existing table. 不要在用户每次想知道他/她的排名时都计算它,也不要在现有表中包括它。 Putting the ranks in a separate table will hopefully ease lock contention problems when score updates compete with the rank computation. 当分数更新与等级计算竞争时,将等级放在单独的表中有望缓解锁争用问题。

Recalculate the ranks at a regular interval. 定期重新计算等级。 When you do this recalculation, do it by truncating the ranks table and recreating it from scratch. 当您执行此重新计算时,请通过截断Ranks表并从头开始重新创建来进行计算。 Doing this with either a bulk load operation (LOAD DATA INFILE) or make it a MyISAM table (which is fast when inserting at the end of the table). 通过批量加载操作(LOAD DATA INFILE)或使其成为MyISAM表(在表末尾插入时速度很快)来执行此操作。 Either way should be relatively fast to actually write out the table; 两种方法都应该相对较快以实际写出表; faster, at least, than updating millions of rows in a table that's already in place. 至少比更新已经存在的表中的数百万行更快。 Both of these methods make your ranks table brittle and vulnerable to loss in the event of a crash, but that's OK because this is essentially transient data. 这两种方法都会使您的等级表变脆,并且在发生崩溃时很容易丢失,但这没关系,因为这本质上是瞬时数据。 As long as your scores table is stable, you're safe. 只要您的成绩表稳定,就可以安全。 By recalculating at regular intervals, you avoid the problem of having to do the calculation more and more often as the number of plays increases until you hit a wall. 通过定期进行重新计算,避免了随着打法次数的增加,直到撞墙之前越来越频繁地进行计算的问题。

If the user scores within the top 100, push out their new score right away. 如果用户得分在前100名之内,请立即推出他们的新分数。 Users may want to browse the top 100 to see who has the highest scores. 用户可能希望浏览前100名,以查看谁得分最高。 I see little likelihood of anyone wanting to actually browse the list below that point. 我几乎看不到任何人想要实际浏览该点以下的列表。

Allow users to see their friends' scores immediately, along with their relative rank compared to each other. 允许用户立即查看朋友的分数,以及彼此之间的相对排名。 This is probably the ranking that most users are interested in. I know when my wife plays a Facebook game, she has no interest in her overall ranking but she very much wants to know if she beat her college classmates. 这可能是大多数用户感兴趣的排名。我知道当我的妻子玩Facebook游戏时,她对整体排名没有兴趣,但是她非常想知道自己是否击败了大学同学。

Show the player's overall rank, and those of their friends, as invalidated after the user's latest play, and load them asynchronously whenever the next update is ready. 显示玩家的最新排名以及该玩家的最新排名,并在下次更新准备好时异步加载它们。

Another consideration is that, if this game will be around for a few years, your scoreboard will end up clogged with old scores from inactive players, especially around the low end. 另一个考虑因素是,如果这款游戏将持续数年,那么您的计分板将最终被闲置玩家的旧分数所阻塞,尤其是在低端玩家。 You may want to consider whether it's worthwhile to archive off these scores. 您可能要考虑是否值得将这些分数存档。 For example, you could say that any player in the lower 75% of the scoreboard will only be considered in the ranking if they played within the last 6 months. 例如,您可以说得分榜低75%的任何球员只有在最近6个月内出战时才被认为在排名中。 Then, move their scores out to an archive table, where they will be remembered and can be restored to the scoreboard if that player returns but won't have to be included in the sort every time you calculate the ranking. 然后,将他们的分数移到存档表中,在该表中将记住这些分数,并且如果该玩家返回,则可以将其还原到记分板,但不必在每次计算排名时都将其包括在排序中。 Yes, this will arguably make your ranking less "true", but people are just playing for fun anyway. 是的,可以说这会使您的排名不那么“真实”,但是无论如何,人们只是在玩耍。 It would have the side effect of making their rankings look better which is also fun. 这样做的副作用是使他们的排名看起来更好,这也很有趣。 Some fine print on the scoreboard will briefly mention that old scores aren't included, so you can still say everything is above the board. 记分板上的一些精美文字会简短地提到不包括旧得分,因此您仍然可以说一切都在记分板上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM