简体   繁体   English

Rails 和 postgres 收集随机记录,在特定列上具有分页和自定义权重

[英]Rails & postgres collection of random records with pagination and custom weight on specific column

I would like to retrieve a random collection of items paginated with a particular weight on the created_at .我想检索在created_at上以特定权重分页的项目的随机集合。

I successfully retrieved a random collection paginated with postgres option setseed .我成功地检索了一个使用 postgres 选项setseed分页的随机集合。

The thing is, how do I combine some sort of weighing on created_at in my collection (which will give a better chance for the weighed items to be in the random sample) and this setseed option with postgres.问题是,我如何将对我的集合中的created_at进行某种称重(这将使称重的项目更有可能出现在随机样本中)和这个setseed选项与 postgres 结合起来。

I'm thinking of something like retrieving the items, add them the weight I want and then do my random request but I think it will not be good performance-wise.我正在考虑检索项目,将它们添加到我想要的重量然后执行我的随机请求,但我认为这在性能方面不会很好。

I'm in a kind of a dead end there and I don't know how to approach this issue.我在那里有点死胡同,我不知道如何处理这个问题。

Here is what I did for now: Simply using setseed option to have a different batch of random items on each of my pages:这是我现在所做的:简单地使用setseed选项在我的每个页面上有不同批次的随机项目:

Item.connection.execute "select setseed(0.5)"
Item.where(...).order('random()').page(params[:page]).per_page(15)

I would suggest to convert your created_at to aa float.我建议将您的created_at转换为 aa float。 Here is an example这是一个例子

Item.select("*, RANDOM() * to_char(created_at, 'YYYYMMDD')::float AS my_new_order_val").order(my_new_order_val: :desc)

Randomized sorting with weight on the created_at timestamp can be achieved with some math.可以通过一些数学来实现对 created_at 时间戳进行加权的随机排序。

The random() function in postgres will always create a value where 0.0 <= random() < 1.0 . postgres 中的random() function 将始终创建一个值,其中0.0 <= random() < 1.0

Since you want the newest items first, create a newness ratio so that anything created just now has a ratio of 1/1 or 100%.由于您首先想要最新的项目,因此创建一个新比率,以便刚刚创建的任何内容都具有 1/1 或 100% 的比率。

Anything older than just now has a lower newness ratio than 100%.任何比刚才更旧的东西的新旧率都低于 100%。

For example, if now() in epoch time is 1645955465 and yesterday is 1645869065, and a year ago is 1614419721, the ratios are:例如,如果纪元时间的now()为 1645955465,昨天为 1645869065,一年前为 1614419721,则比率为:

now/now is  1645955465/1645955465 = 1.0
yesterday/now is 1645869065/1645955465 = 0.99994
1 year ago/now is 1614419721/1645955465 = 0.98084

The ratio calculation above may work for you.上面的比率计算可能对您有用。 In the calculation above, now is 100% new, yesterday is 99.994% new, and one year ago is 98.084% new.在上面的计算中,现在是 100% 新,昨天是 99.994% 新,一年前是 98.084% 新。

Next, multiply the newness ratio by a random number.接下来,将新比率乘以一个随机数。 This gives you a weighted random number.这为您提供了一个加权随机数。 Newer items will have more weight.较新的项目将具有更多的重量。 To do the calculation, extract the newness ratio epoch and multiply by a random number.要进行计算,请提取新旧率纪元并乘以一个随机数。

Item.where(...)
  .order
  ("(extract(epoch(from created_at)) 
    / extract(epoch from now())) 
    * RANDOM()")
  .page(params[:page])
  .per_page(15)

Depending on your data, the ratios might not be different enough to have a noticeable effect on the random number sort.根据您的数据,这些比率的差异可能不足以对随机数排序产生显着影响。 There are many ways to manipulate the randomized sort beyond what's described above.除了上述方法之外,还有许多方法可以操纵随机排序。 For example, you could reduce the amount of randomness by giving the randomizer a smaller range than 0 to 1. Or you could make the newness ratios have a greater range than 0.98084 to 1.例如,您可以通过为随机化器提供比 0 到 1 更小的范围来减少随机数。或者您可以使新比率的范围大于 0.98084 到 1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM