简体   繁体   中英

How to dynamically perform a weighted random row selection in PostgreSQL?

I have following table for an app where a student is assigned task to play educational game.

Student{id, last_played_datetime, total_play_duration, total_points_earned}

The app selects a student at random and assigns the task. The student earns a point for just playing the game. The app records the date and time when the game was played and for how much duration. I want to randomly select a student and assign the task. At a time only one student can be assigned the task. To give equal opportunity to all students I am dynamically calculating weight for the student using the date and time a student last played the game, the total play duration and the total points earned by the student. A student will then be randomly choosen influenced on the weight.

How do I, in PostgreSQL, randomly select a row from a table depending on the dynamically calculated weight of the row?

The weight for each student is calculated as follows: (minutes(current_datetime - last_played_datetime) * 0.75 + total_play_duration * 0.5 + total_points_earned * 0.25) / 1.5

Sample data:

+====+======================+=====================+=====================+
| Id | last_played_datetime | total_play_duration | total_points_earned |
+====+======================+=====================+=====================+
| 1  | 01/02/2011           | 300 mins            |  7                  |
+----+----------------------+---------------------+---------------------+
| 2  | 06/02/2011           | 400 mins            |  6                  |
+----+----------------------+---------------------+---------------------+
| 3  | 01/03/2011           | 350 mins            |  8                  |
+----+----------------------+---------------------+---------------------+
| 4  | 22/03/2011           | 550 mins            |  9                  |
+----+----------------------+---------------------+---------------------+
| 5  | 01/03/2011           | 350 mins            |  8                  |
+----+----------------------+---------------------+---------------------+
| 6  | 10/01/2011           | 130 mins            |  2                  |
+----+----------------------+---------------------+---------------------+
| 7  | 03/01/2011           |  30 mins            |  1                  |
+----+----------------------+---------------------+---------------------+
| 8  | 07/10/2011           |   0 mins            |  0                  |
+----+----------------------+---------------------+---------------------+

Here is a solution that works as follows:

  • first compute the weight of each student
  • sum the weight of all students and multiply if by a random seed
  • then pick the first student above that target, random, weight

Query:

with 
    student_with_weight as (
        select 
            id,
            (
                extract(epoch from (now() - last_played_datetime)) / 60 * 0.75
                + total_play_duration * 0.5
                + total_points_earned * 0.25
            ) / 1.5 weight
        from student
    ),
    random_weight as (
        select random() * (select sum(weight) weight from student_with_weight ) weight
    )
select id 
from 
    student_with_weight s
    inner join random_weight r on s.weight >= r.weight
order by id
limit 1;

You can use a cumulative sum on the weights and compare to rand() . It looks like this:

with s as (
      select s.*, 
             <your expression> as weight
      from s
     )
select s.*
from (select s.*,
             sum(weight) over (order by weight) as running_weight,
             sum(weight) over () as total_weight
      from s
     ) s cross join
     (values (random())) r(rand)
where r.rand * total_weight >= running_weight - weight and
      r.rand * total_weight < running_weight;

The values() clause ensures that the random value is calculated only once for the query. Funky things can happen if you put random() in the where clause, because it will be recalculated for each comparison.

Basically, you can think of the cumulative sum as dividing up the total count into discrete regions. The rand() is then just choosing one of them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM