简体繁体 English

好主意/坏主意？在一小组子查询结果之外使用MySQL RAND（）？

[英]Good Idea/Bad Idea? Using MySQL RAND() outside of a small set of subquery results?

原文 2011-01-17 04:33:55 8 3 php/ mysql/ random

So in MySQL, I've read that for large tables with lots of rows, using ORDER BY RAND() is a bad idea (even with ~500 row tables, supposedly). 所以在MySQL中，我已经读到了对于包含大量行的大型表，使用ORDER BY RAND（）是一个坏主意（即使有~500行表，据说）。 Slow and inefficient. 缓慢而低效。 Lots of row scanning. 很多行扫描。

How does this (below) seem for an alternative? 这个（下面）如何替代？

SELECT * FROM (...subquery that generally returns a set of fewer than 20 rows...) ORDER BY RAND() LIMIT 8 SELECT * FROM（...通常返回一组少于20行的子查询...）ORDER BY RAND（）LIMIT 8

Instead of using RAND() on a large set of data, I'd select a small subset, and only then would I apply RAND() on those returned rows. 我没有在大量数据上使用RAND（），而是选择一个小子集，然后才对这些返回的行应用RAND（）。 In 99.9% of all cases, the subquery seen above should select fewer than 20 rows (and in fact, it's generally fewer than 8). 在99.9％的情况下，上面看到的子查询应该选择少于20行（事实上，它通常少于8行）。

Curious to hear what people think. 很想听听人们的想法。

(Just for reference, I'm doing my MySQL stuff with PHP.) （仅供参考，我正在用PHP做MySQL的东西。）

Thanks! 谢谢！

3 个解决方案

Actually...I ended up running a test and I might have answered my own question. 实际上......我最终进行了测试，我可能已经回答了我自己的问题。 I thought I'd post this information here in case it was useful for anyone else. 我想我会在这里发布这些信息，以防它对其他人有用。 (If I've done anything wrong here, please let me know!) （如果我在这里做错了什么，请告诉我！）

This is kind of surprising... 这有点令人惊讶......

Contrary to everything that I've read, I created a table called TestData with 1 million rows and ran the following query: 与我读过的所有内容相反，我创建了一个名为TestData的表，其中包含100万行并运行以下查询：

SELECT * FROM TestData WHERE number = 41 ORDER BY RAND() LIMIT 8 SELECT * FROM TestData WHERE number = 41 ORDER BY RAND（）LIMIT 8

...and it returned the rows in an average of 0.0070 seconds. ...它返回的行平均为0.0070秒。 I don't really see why RAND() has such a bad reputation. 我真的不明白为什么兰德（）有这么糟糕的名声。 It seems pretty usable to me, at least in this particular situation. 它似乎对我很有用，至少在这种特殊情况下。

I have three columns in my table: 我的表中有三列：

id [BIGINT(20)] | id [BIGINT（20）] | textfield [tinytext] | textfield [tinytext] | number [BIGINT(20)] 号码[BIGINT（20）]

Primary Key on id, index on number. id上的主键，数字上的索引。

I guess MySQL is smart enough to know that it should only be applying RAND() to the 20 rows that are returned by "WHERE number = 41" ? 我猜MySQL很聪明，知道它只应该将RAND（）应用于“WHERE number = 41”返回的20行？ (I specifically added only 20 rows that had the value 41 for 'number'.) （我特意添加了20行，其中“数字”的值为41。）

The alternate subquery method returns results with an average time of around .0080 seconds, which is slower than the non-subquery method. 备用子查询方法返回平均时间约为.0080秒的结果，这比非子查询方法慢。

Subquery method: SELECT * FROM (SELECT * FROM TestData WHERE number = 41) as t ORDER BY RAND() LIMIT 8 子查询方法：SELECT * FROM（SELECT * FROM TestData WHERE number = 41）as t ORDER BY RAND（）LIMIT 8

Sounds like your on the right track. 听起来像你在正确的轨道上。 One of the best ways to be more efficient in the use of MySQL is to restrict your datasets through masterful queries. 提高MySQL使用效率的最佳方法之一是通过熟练的查询来限制数据集。

我不久前在这篇文章中重述了这个问题： http ： //www.electrictoolbox.com/mysql-random-order-random-value/但我真的不想在我的数据中添加另一列。