I have a type of lottery system with random picks I am trying to optimize.
I have the following constraints:
Here is my current query. It is a ARBITRARY PICK but now I want to change it/recreate it to have a want a RANDOM PICK (but avoid the usual random() limit 1 that need to go through all the 1M rows and is very slow, even maybe avoid offset(?) as it is notoriously slow on large datasets).
UPDATE tickets s
SET available = false
FROM (
SELECT id
FROM tickets
WHERE deal_id = #{@deal.id}
AND available
AND pg_try_advisory_xact_lock(id)
LIMIT 1
FOR UPDATE
) sub
WHERE s.id = sub.id
RETURNING s.name, s.id
how to change this query to move from arbitrary pick to a RANDOM pick and with the fastest query possible?
I'd like if possible tangible query suggestions, that I will try in my app.
regardind IDs, there can be huge gaps between ids in the table as a whole BUT inside the 'tickets from a specific deal' (see query below) there is not any gap between IDs (not even the smallest), which i presume can matter to find the most appropriate query.
This makes your life much easier. I'd use the following approach.
0) Create index on (deal_id, available, id)
.
1) Get MIN
and MAX
values of ID for the given deal_id
.
SELECT MIN(id) AS MinID, MAX(id) AS MaxID
FROM tickets
WHERE deal_id = #{@deal.id}
AND available
If this query results in index scan instead of seek, use two separate queries for MIN
and MAX
.
2) Generate a random integer number RandID
in the found range [MinID; MaxID]
[MinID; MaxID]
.
3) Pick a row with ID=RandID
. The query should seek an index.
UPDATE tickets s
SET available = false
FROM (
SELECT id
FROM tickets
WHERE deal_id = #{@deal.id}
AND available
AND id = @RandID
AND pg_try_advisory_xact_lock(id)
LIMIT 1
FOR UPDATE
) sub
WHERE s.id = sub.id
RETURNING s.name, s.id
If there are concurrent processes that can add or delete rows consider increasing transaction isolation level to serializable.
Having said all this I realised that it won't work. When you say, that IDs don't have gaps you most likely mean that there are no gaps for IDs with the same deal_id
(regardless of the value of the available
column), but not among IDs that have the same deal_id
AND available=true
.
As soon as the first random row is set to available=false
there will be a gap in IDs.
Second attempt
Add a float
column RandomNumber
to the tickets
table that should hold a random number in the range (0,1). Whenever you add a row to this table generate such random number and save it in this column.
Create index on (deal_id, available, RandomNumber)
.
Order by this RandomNumber
to pick a random row that is still available. The query should seek an index.
UPDATE tickets s
SET available = false
FROM (
SELECT id
FROM tickets
WHERE deal_id = #{@deal.id}
AND available
AND pg_try_advisory_xact_lock(id)
ORDER BY RandomNumber
LIMIT 1
FOR UPDATE
) sub
WHERE s.id = sub.id
RETURNING s.name, s.id
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.