[英]How do you efficiently (in a DB independent manner) select random records from a table?
This seems like an incredibly simple problem however it isn't working out as trivially as I'd expected. 这似乎是一个非常简单的问题,但是并没有像我预期的那么简单。
I have a club which has club members and I'd like to pull out two members at random from a club. 我有一个拥有俱乐部会员的俱乐部,我想从俱乐部随机抽出两名成员。
Using RANDOM() 使用RANDOM()
One way is to use random ordering: 一种方法是使用随机排序:
club.members.find(:all, :order => 'RANDOM()').limit(2)
However that is different for SqLite (the dev database) and Postgres (production) since in MySql the command is RAND()
. 但是,这对于SqLite(开发人员数据库)和Postgres(生产人员)是不同的,因为在MySql中,命令是
RAND()
。
While I could start writing some wrappers around this I feel that the fact that it hasn't been done already and doesn't seem to be part of ActiveRecord tells me something and that RANDOM may not be the right way to go. 尽管我可以开始为此编写一些包装器,但我感到它尚未完成并且似乎不是ActiveRecord的一部分这一事实告诉了我一些信息,而RANDOM可能不是正确的选择。
Pulling items out directly using their index 使用索引直接拉出项目
Another way of doing this is to pull the set in order but then select random records from it: 执行此操作的另一种方法是按顺序提取集,然后从中选择随机记录:
First off we need to generate a sequence of two unique indices corresponding to the members: 首先,我们需要生成一个与成员相对应的两个唯一索引的序列:
all_indices = 1..club.members.count
two_rand_indices = all_indices.to_a.shuffle.slice(0,2)
This gives an array with two indices guaranteed to be unique and random. 这给出了一个带有两个保证唯一且随机的索引的数组。 We can use these indices to pull out our records
我们可以使用这些索引提取记录
@user1, @user2 = Club.members.values_at(*two_rand_indices)
What's the best method? 最好的方法是什么?
While the second method is seems pretty nice, I also feel like I might be missing something and might have over complicated a simple problem. 尽管第二种方法看起来不错,但我也觉得我可能会遗漏某些东西,并且可能使一个简单的问题变得复杂。 I'm clearly not the first person to have tackled this so what is the best, most SQL efficient route through it?
我显然不是第一个解决此问题的人,那么通过它的最佳,最有效的SQL路由是什么?
The problem with your first method is that it sorts the whole table by an unindexable expression, just to take two rows. 您的第一种方法的问题在于,它使用不可索引的表达式对整个表进行排序,仅占用两行。 This does not scale well.
这不能很好地扩展。
The problem with your second method is similar, if you have 10 9 rows in your table, then you will generate a large array from to_a
. 第二种方法的问题是相似的,如果表中有10 9行,则将从
to_a
生成一个大数组。 That will take a lot of memory and time to shuffle it. 这将需要大量的内存和时间来洗牌。
Also by using values_at
aren't you assuming that there's a row for every primary key value from 1 to count, with no gaps? 另外,通过使用
values_at
,您是否不假设每个主键值从1开始都有一行,没有间隔? You shouldn't assume that. 你不应该这么认为。
What I'd recommend instead is: 我推荐的是:
Count the rows in the table. 计算表中的行。
c = Club.members.count
Pick two random numbers between 1 and the count. 在1和计数之间选择两个随机数。
r_a = 2.times.map{ 1+Random.rand(c) }
Query your table with limit and offset . 用limit和offset查询表。
Don't use ORDER BY
, just rely on the RDBMS's arbitrary ordering. 不要使用
ORDER BY
,而只需依赖RDBMS的任意顺序。
for r in r_a row = Club.members.limit(1).offset(r) end
See also: 也可以看看:
The Order By RAND() function in MySQL: MySQL中的RAND()函数:
ORDER BY RAND() LIMIT 4
This will select a random 4 rows when the above is the final clause in the query. 当上面是查询中的最后一个子句时,它将随机选择4行。
尝试使用randumb gem,它实现了您提到的第二种方法
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.