简体   繁体   中英

Select 1000 samples per group from a table in AWS Athena

I have a table with the below schema in AWS Athena在此处输入图片说明

Number of unique standard_lab_parameter_name, units pair in DB is ~3k & the DB has about 80 Million entries. Now, I wish to get 1000 samples(random) per unique standard_lab_parameter_name, units pair hence nearly 3k x 1000 rows. I tried searching the internet for any such query but in vain. Any help?

You can use a CTE to generate random row numbers for each standard_lab_parameter_name , units pair, and then select the first 1000 rows for each pair by requiring the row number to be <= 1000 :

WITH CTE AS (
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY standard_lab_parameter_name, units ORDER BY RANDOM()) AS rn
    FROM yourtable
)
SELECT *
FROM CTE
WHERE rn <= 1000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM