简体   繁体   English

您如何有效地(以与数据库无关的方式)从表中选择随机记录?

[英]How do you efficiently (in a DB independent manner) select random records from a table?

This seems like an incredibly simple problem however it isn't working out as trivially as I'd expected. 这似乎是一个非常简单的问题,但是并没有像我预期的那么简单。

I have a club which has club members and I'd like to pull out two members at random from a club. 我有一个拥有俱乐部会员的俱乐部,我想从俱乐部随机抽出两名成员。

Using RANDOM() 使用RANDOM()

One way is to use random ordering: 一种方法是使用随机排序:

club.members.find(:all, :order => 'RANDOM()').limit(2)

However that is different for SqLite (the dev database) and Postgres (production) since in MySql the command is RAND() . 但是,这对于SqLite(开发人员数据库)和Postgres(生产人员)是不同的,因为在MySql中,命令是RAND()

While I could start writing some wrappers around this I feel that the fact that it hasn't been done already and doesn't seem to be part of ActiveRecord tells me something and that RANDOM may not be the right way to go. 尽管我可以开始为此编写一些包装器,但我感到它尚未完成并且似乎不是ActiveRecord的一部分这一事实告诉了我一些信息,而RANDOM可能不是正确的选择。

Pulling items out directly using their index 使用索引直接拉出项目

Another way of doing this is to pull the set in order but then select random records from it: 执行此操作的另一种方法是按顺序提取集,然后从中选择随机记录:

First off we need to generate a sequence of two unique indices corresponding to the members: 首先,我们需要生成一个与成员相对应的两个唯一索引的序列:

all_indices = 1..club.members.count
two_rand_indices = all_indices.to_a.shuffle.slice(0,2)

This gives an array with two indices guaranteed to be unique and random. 这给出了一个带有两个保证唯一且随机的索引的数组。 We can use these indices to pull out our records 我们可以使用这些索引提取记录

@user1, @user2 = Club.members.values_at(*two_rand_indices)

What's the best method? 最好的方法是什么?

While the second method is seems pretty nice, I also feel like I might be missing something and might have over complicated a simple problem. 尽管第二种方法看起来不错,但我也觉得我可能会遗漏某些东西,并且可能使一个简单的问题变得复杂。 I'm clearly not the first person to have tackled this so what is the best, most SQL efficient route through it? 我显然不是第一个解决此问题的人,那么通过它的最佳,最有效的SQL路由是什么?

The problem with your first method is that it sorts the whole table by an unindexable expression, just to take two rows. 您的第一种方法的问题在于,它使用不可索引的表达式对整个表进行排序,仅占用两行。 This does not scale well. 这不能很好地扩展。

The problem with your second method is similar, if you have 10 9 rows in your table, then you will generate a large array from to_a . 第二种方法的问题是相似的,如果表中有10 9行,则将从to_a生成一个大数组。 That will take a lot of memory and time to shuffle it. 这将需要大量的内存和时间来洗牌。

Also by using values_at aren't you assuming that there's a row for every primary key value from 1 to count, with no gaps? 另外,通过使用values_at ,您是否不假设每个主键值从1开始都有一行,没有间隔? You shouldn't assume that. 你不应该这么认为。

What I'd recommend instead is: 我推荐的是:

  1. Count the rows in the table. 计算表中的行。

     c = Club.members.count 
  2. Pick two random numbers between 1 and the count. 在1和计数之间选择两个随机数。

     r_a = 2.times.map{ 1+Random.rand(c) } 
  3. Query your table with limit and offset . limit和offset查询表。
    Don't use ORDER BY , just rely on the RDBMS's arbitrary ordering. 不要使用ORDER BY ,而只需依赖RDBMS的任意顺序。

     for r in r_a row = Club.members.limit(1).offset(r) end 

See also: 也可以看看:

The Order By RAND() function in MySQL: MySQL中的RAND()函数:

ORDER BY RAND() LIMIT 4

This will select a random 4 rows when the above is the final clause in the query. 当上面是查询中的最后一个子句时,它将随机选择4行。

尝试使用randumb gem,它实现了您提到的第二种方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 你怎么能 select TSQL 表中的前 99.8% 记录 - How can you select the first 99.8% records from a table in TSQL 如何从SQL表中选择3个独立的元组 - How to select 3 independent tuples from an SQL Table 如何使用BigQuery在另一个表中有效选择与子字符串匹配的记录? - How to efficiently select records matching substring in another table using BigQuery? 如何从MySQL中随机选择记录样本? - How to select a random sample of records from MySQL? 如何随机从表中选择唯一的行对? - How do I select unique pairs of rows from a table at random? 如何从部分分组的Sqlite表中选择记录? - How do I select records from Sqlite table partially grouped? 如何从有限制且没有重复的表中选择随机行? - How do I select random rows from a table with limit and no duplicates? 从表中选择随机记录,但对于每个区域; 4个区域中的每个区域有3个记录,其余2个区域中的18个记录 - Select Random records from table but for each region; 3 records from each of 4 regions, and 18 records from the remaining 2 regions 如何删除 BigQuery 表中的重复记录? - How do you deduplicate records in a BigQuery table? 如何从表A中选择记录并更新表A - how to select records from table A and update table A
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM