简体   繁体   English

如何在 PostgreSQL 中动态执行加权随机行选择?

[英]How to dynamically perform a weighted random row selection in PostgreSQL?

I have following table for an app where a student is assigned task to play educational game.我有一个应用程序的下表,其中学生被分配任务来玩教育游戏。

Student{id, last_played_datetime, total_play_duration, total_points_earned}学生{id,last_played_datetime,total_play_duration,total_points_earned}

The app selects a student at random and assigns the task.该应用程序随机选择一名学生并分配任务。 The student earns a point for just playing the game.学生只需玩游戏即可获得一分。 The app records the date and time when the game was played and for how much duration.该应用程序记录玩游戏的日期和时间以及持续时间。 I want to randomly select a student and assign the task.我想随机 select 一个学生并分配任务。 At a time only one student can be assigned the task.一次只能为一名学生分配任务。 To give equal opportunity to all students I am dynamically calculating weight for the student using the date and time a student last played the game, the total play duration and the total points earned by the student.为了给所有学生平等的机会,我正在使用学生上次玩游戏的日期和时间、总游戏时间和学生获得的总积分动态计算学生的体重。 A student will then be randomly choosen influenced on the weight.然后将随机选择一个受体重影响的学生。

How do I, in PostgreSQL, randomly select a row from a table depending on the dynamically calculated weight of the row?我如何在 PostgreSQL 中根据动态计算的行权重从表中随机 select 行?

The weight for each student is calculated as follows: (minutes(current_datetime - last_played_datetime) * 0.75 + total_play_duration * 0.5 + total_points_earned * 0.25) / 1.5每个学生的权重计算如下:(分钟(current_datetime - last_played_datetime) * 0.75 + total_play_duration * 0.5 + total_points_earned * 0.25) / 1.5

Sample data:样本数据:

+====+======================+=====================+=====================+
| Id | last_played_datetime | total_play_duration | total_points_earned |
+====+======================+=====================+=====================+
| 1  | 01/02/2011           | 300 mins            |  7                  |
+----+----------------------+---------------------+---------------------+
| 2  | 06/02/2011           | 400 mins            |  6                  |
+----+----------------------+---------------------+---------------------+
| 3  | 01/03/2011           | 350 mins            |  8                  |
+----+----------------------+---------------------+---------------------+
| 4  | 22/03/2011           | 550 mins            |  9                  |
+----+----------------------+---------------------+---------------------+
| 5  | 01/03/2011           | 350 mins            |  8                  |
+----+----------------------+---------------------+---------------------+
| 6  | 10/01/2011           | 130 mins            |  2                  |
+----+----------------------+---------------------+---------------------+
| 7  | 03/01/2011           |  30 mins            |  1                  |
+----+----------------------+---------------------+---------------------+
| 8  | 07/10/2011           |   0 mins            |  0                  |
+----+----------------------+---------------------+---------------------+

Here is a solution that works as follows:这是一个工作原理如下的解决方案:

  • first compute the weight of each student首先计算每个学生的体重
  • sum the weight of all students and multiply if by a random seed将所有学生的权重相加并乘以随机种子
  • then pick the first student above that target, random, weight然后选择高于该目标的第一个学生,随机,权重

Query:询问:

with 
    student_with_weight as (
        select 
            id,
            (
                extract(epoch from (now() - last_played_datetime)) / 60 * 0.75
                + total_play_duration * 0.5
                + total_points_earned * 0.25
            ) / 1.5 weight
        from student
    ),
    random_weight as (
        select random() * (select sum(weight) weight from student_with_weight ) weight
    )
select id 
from 
    student_with_weight s
    inner join random_weight r on s.weight >= r.weight
order by id
limit 1;

You can use a cumulative sum on the weights and compare to rand() .您可以对权重使用累积总和并与rand()进行比较。 It looks like this:它看起来像这样:

with s as (
      select s.*, 
             <your expression> as weight
      from s
     )
select s.*
from (select s.*,
             sum(weight) over (order by weight) as running_weight,
             sum(weight) over () as total_weight
      from s
     ) s cross join
     (values (random())) r(rand)
where r.rand * total_weight >= running_weight - weight and
      r.rand * total_weight < running_weight;

The values() clause ensures that the random value is calculated only once for the query. values()子句确保随机值只为查询计算一次。 Funky things can happen if you put random() in the where clause, because it will be recalculated for each comparison.如果将random()放在where子句中,可能会发生一些奇怪的事情,因为每次比较都会重新计算它。

Basically, you can think of the cumulative sum as dividing up the total count into discrete regions.基本上,您可以将累积总和视为将总数划分为离散区域。 The rand() is then just choosing one of them.然后rand()只是选择其中之一。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM