简体   繁体   中英

postgresql 9.4/9.5 - Select…for update one single random row on a large dataset with high Reads and Writes

I have a type of lottery system with random picks I am trying to optimize.

I have the following constraints:

  • I need to apply the SELECT...FOR UPDATE only to rows where the deal_id is the current deal of my app (ie i don't apply it it on the WHOLE table/on ALL the rows of the table, only on those where for example deal_id= 3 for example)
  • I need to select only rows where available=true
  • I need to output only 1 row (when a player buys a ticket I must go check these 1 million rows and RANDOMLY choose one for him (only one so many Stackoverflow solutions like here or TABLESAMPLE do not easily work)
  • I have usually about 1 million rows that match deal_id = 3 (3 as an example) and available =true (out of a total of about 30M rows at any given time)
  • I have very high READS and WRITES => about 50 to 100+ concurrent reads on the table and as a consequence same number approx of writes (as once chosen, available= true is changed to 'false' inside the SELECT..for UPDATE)
  • I have a lock while the select/update on a row is being implemented. Now I'm using SELECT..FOR UPDATE with pg_try_advisory_xact_lock (and when postgresql 9.5 goes out of beta, I will use SKIP LOCKED)
  • I need blazing fast speed. i target a query < 5ms
  • regardind IDs, there can be huge gaps between ids in the table as a whole BUT inside the 'tickets from a specific deal' (see query below) there is not any gap between IDs (not even the smallest), which i presume can matter to find the most appropriate query.

Here is my current query. It is a ARBITRARY PICK but now I want to change it/recreate it to have a want a RANDOM PICK (but avoid the usual random() limit 1 that need to go through all the 1M rows and is very slow, even maybe avoid offset(?) as it is notoriously slow on large datasets).

UPDATE tickets s
        SET available = false
        FROM (
              SELECT id
              FROM   tickets
              WHERE  deal_id = #{@deal.id}
              AND    available
              AND    pg_try_advisory_xact_lock(id)
              LIMIT  1
              FOR    UPDATE
              ) sub
        WHERE         s.id = sub.id
        RETURNING     s.name, s.id

how to change this query to move from arbitrary pick to a RANDOM pick and with the fastest query possible?

I'd like if possible tangible query suggestions, that I will try in my app.

regardind IDs, there can be huge gaps between ids in the table as a whole BUT inside the 'tickets from a specific deal' (see query below) there is not any gap between IDs (not even the smallest), which i presume can matter to find the most appropriate query.

This makes your life much easier. I'd use the following approach.

0) Create index on (deal_id, available, id) .

1) Get MIN and MAX values of ID for the given deal_id .

SELECT MIN(id) AS MinID, MAX(id) AS MaxID
FROM   tickets
WHERE  deal_id = #{@deal.id}
AND    available

If this query results in index scan instead of seek, use two separate queries for MIN and MAX .

2) Generate a random integer number RandID in the found range [MinID; MaxID] [MinID; MaxID] .

3) Pick a row with ID=RandID . The query should seek an index.

UPDATE tickets s
    SET available = false
    FROM (
          SELECT id
          FROM   tickets
          WHERE  deal_id = #{@deal.id}
          AND    available
          AND    id = @RandID
          AND    pg_try_advisory_xact_lock(id)
          LIMIT  1
          FOR    UPDATE
          ) sub
    WHERE         s.id = sub.id
    RETURNING     s.name, s.id

If there are concurrent processes that can add or delete rows consider increasing transaction isolation level to serializable.


Having said all this I realised that it won't work. When you say, that IDs don't have gaps you most likely mean that there are no gaps for IDs with the same deal_id (regardless of the value of the available column), but not among IDs that have the same deal_id AND available=true .

As soon as the first random row is set to available=false there will be a gap in IDs.


Second attempt

Add a float column RandomNumber to the tickets table that should hold a random number in the range (0,1). Whenever you add a row to this table generate such random number and save it in this column.

Create index on (deal_id, available, RandomNumber) .

Order by this RandomNumber to pick a random row that is still available. The query should seek an index.

UPDATE tickets s
    SET available = false
    FROM (
          SELECT id
          FROM   tickets
          WHERE  deal_id = #{@deal.id}
          AND    available
          AND    pg_try_advisory_xact_lock(id)
          ORDER BY RandomNumber
          LIMIT  1
          FOR    UPDATE
          ) sub
    WHERE         s.id = sub.id
    RETURNING     s.name, s.id

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM