简体   繁体   中英

How to randomly select some, say 1000, values from a specific columns in a very large table in MySQL?

I have a very large table in MySQL database, which has a columns names exa_id and the number of rows of this table is more than 10,000,000. I want to randomly and efficiently select only 1000 of the data through pandas.read_sql statement in Python. How can I write the code?

The SQL select ext_id from table_name order by rand() limit 1000 performs really bad, I'd like to another way.

One more explanation is that the contents of column exa_id are strings, like 'uudjsx-2220983-df','ujxnas-9800xdsd-d2',..., not auto-increasing sequence.

This works under most circumstances:

select ext_id
from table_name t
where rand() < 2000 / 10000000
order by rand()
limit 1000;

The inner query selects approximately 2000 rows. There is some statistical variability. The outer query then orders these randomly and selects 1000 of them.

If you don't know the number of rows, you can do:

select t.ext_id
from table_name t cross join
     (select count(*) as cnt from t) tt
where rand() < 2000 / tt.cnt
order by rand()
limit 1000;

This Query Will help you.

SELECT name  FROM random AS r1
JOIN (SELECT CEIL(RAND() * (SELECT MAX(id) FROM random)) AS id) AS r2 
WHERE r1.id >= r2.id
ORDER BY r1.id ASC
LIMIT 100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM