简体   繁体   English

WHERE子句中的MySQL RAND()匹配一小组行

[英]MySQL RAND() in WHERE clause matches a small set of rows

i get into an interesting issue with MySQL. 我遇到一个与MySQL有关的有趣问题。 When i try to multiply RAND() function with some big integer, i get max random number really small. 当我尝试将RAND()函数与某个大整数相乘时,我得到的最大随机数确实很小。 Here is my MySQL query, that should be a very fast random query, but it returns ID max 36000, even when there are 4600000+ IDs. 这是我的MySQL查询,应该是一个非常快速的随机查询,但是即使有4600000+个ID,它也会返回最大ID 36000。

SET @maxID=(SELECT MAX(id) FROM property); #it's about 4600000

SELECT * FROM property
WHERE 
downloaded_at IS NULL
AND id >= FLOOR(1 + RAND() * @maxID) #this returns max +/-36000
LIMIT 100

When i move this code into plain SELECT query, everything is fine 当我将此代码移入普通SELECT查询时,一切都很好

SELECT FLOOR(1 + RAND() * (SELECT MAX(id) FROM property))

Could someone please explain, why this error occurs? 有人可以解释一下,为什么会发生此错误吗? Thank you! 谢谢!

edits 编辑


Hm, somehow when i remove downloaded_at IS NULL it comes to sences, ID's are higher, but results are not that random anymore. 嗯,不知何故,当我删除downloaded_at IS NULL时,ID更高,但是结果不再是随机的了。


I can't use ORDER BY RAND(), because table is too big, query is too slow and whole server crashes eventually in few minutes 我不能使用ORDER BY RAND(),因为表太大,查询太慢并且最终几分钟后整个服务器都会崩溃


version is 5.7.21-0ubuntu0.16.04.1 版本是5.7.21-0ubuntu0.16.04.1

Your random row selection method is biased ... the probabilty of a row being seleted is proportional to its id. 您的随机行选择方法有偏见 ...被选中的行的概率与其ID成正比。 Eg if you had 10 rows with id = 1 to 10 then 1 has 10% chance of being selected, 2 has 20% and so on. 例如,如果您有10行ID = 1到10,则选择1的机会为10%,选择2的机会为20%,依此类推。

Also, the reason why your code selects ids less than ~36000 is obvious: rows are (usually) processed in PK order and by the time 100th matching row is found, the query has only processed row with id of around 36000. 同样,您的代码选择id小于36000的原因很明显:(通常)以PK顺序处理行,并且在找到第100个匹配行时,查询仅处理了id约为36000的行。

Now, if you are interested in selecting 100 random rows, you can use this query instead: 现在,如果您有兴趣选择100条随机行,则可以改用以下查询:

SELECT *
FROM property
WHERE id IN (
    SELECT id
    FROM property
    WHERE downloaded_at IS NULL
    ORDER BY RAND()
    LIMIT 100
)

Or may be this (rough outline): 或可能是这样(大致轮廓):

SELECT *
FROM property
WHERE id IN (
    SELECT id
    FROM property
    WHERE RAND() <= 100.0 / @maxID -- explanation below
    LIMIT 100
)

The above does not involve sorting but it still needs to scan all ids. 上面不涉及排序,但仍然需要扫描所有ID。 100.0 is same as the desired number of rows but add some more just to be sure. 100.0与所需的行数相同,但是为了确保行数增加了更多。 This should result in equal probability for each row to be selected. 这将导致选择每一行的概率相等。

The problem is that rand() is being called every time the condition in the where clause is evaluated. 问题在于,每次对where子句中的条件求值时,都会调用rand() Instead, put the value in a subquery: 而是将值放在子查询中:

SELECT p.*
FROM property p CROSS JOIN
     (SELECT FLOOR(1 + RAND() * @maxID) as idlim) x
WHERE p.downloaded_at IS NULL AND
      p.id >= x.idlim #this returns max +/-36000
LIMIT 100;

This ensures that the rand() function is called only once. 这样可以确保rand()函数仅被调用一次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM