简体   繁体   中英

Mysql Deleting multiple random row from a table

I now have a table that has 604 000 row. I would like to drop 4000 random rows so my table will only contains 600 000 entries.

Would there be a quick way to do so ?

Thanks a lot.

In theory, this will be random and fast. In practise, it will be only fast:

DELETE FROM tableX
LIMIT 4000

This will be random but terribly slow, with 600K rows:

DELETE FROM tableX
ORDER BY RAND() 
LIMIT 4000

This won't be truly random (as there are usually gaps in the ids) and it may not even delete exactly 4000 rows (but a few less when there are many gaps) but it's probably faster than the previous.

An extra wrapping in a subquery is needed because the syntax for Delete from multiple tables does not allow LIMIT :

DELETE td
FROM
      tableX AS td
  JOIN 
      ( SELECT t.id
        FROM
             tableX AS t 
          CROSS JOIN
              ( SELECT MAX(id) AS maxid 
                FROM tableX
              ) AS m
          JOIN
              ( SELECT RAND() AS rndm
                FROM tableX AS tr
                LIMIT 5000
              ) AS r
            ON 
               t.id = CEIL( rndm * maxid )
        LIMIT 4000
      ) AS x
      ON 
         x.id = td.id 

Explain output (of the subquery, from a 400K rows table):

id           table        possible_keys  key_len     rows 
 select_type         type             key     ref         Extra
1 PRIMARY <derived2> system                             1
1 PRIMARY <derived3> ALL                             5000
1 PRIMARY t          eq_ref PRIMARY PRIMARY 4 func      1 Using where;Using index
3 DERIVED tr         index          PRIMARY 4      398681 Using index
2 DERIVED                                            Select tables optimized away
delete from yourTable limit 4000

如果我不得不猜测:

DELETE FROM table where id = (SELECT id FROM table ORDER BY rand() LIMIT 1) LIMIT 10
DELETE FROM TABLE ORDER BY RAND() LIMIT 4000;

It'll take time though...

A quicker way to execute (not write the code!) may be 4000 separate deletes in a loop

DELETE FROM TABLE WHERE AssumedPKisInt = <ARandomNumber>

Of course, you need to ensure you don't try to delete non-existing or already-deleted rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM