简体   繁体   中英

How to enforce a max limit no. of rows per day per date in SQL?

Given a data that looks as follows where the date is in string format YYYYMMDD :

item vietnamese cost unique_id sales_date
fruits trai cay 10 abc123 20211001
fruits trai cay 8 foo99 20211001
fruits trai cay 9 foo99 20211001
vege rau 3 rr1239 20211001
vege rau 3 rr1239 20211001
fruits trai cay 12 abc123 20211002
fruits trai cay 14 abc123 20211002
fruits trai cay 8 abc123 20211002
fruits trai cay 5 foo99 20211002
vege rau 8 rr1239 20211002
vege rau 1 rr1239 20211002
vege rau 12 ud9213 20211002
vege rau 19 r11759 20211002
fruits trai cay 6 foo99 20211003
fruits trai cay 2 abc123 20211003
fruits trai cay 12 abc123 20211003
vege rau 1 ud97863 20211003
vege rau 9 r112359 20211003
fruits trai cay 6 foo99 20211004
fruits trai cay 2 abc123 20211004
fruits trai cay 12 abc123 20211004
vege rau 9 r112359 20211004

The goal is sample all the rows within a certain time frame, eg 2020-10-02 to 2020-10-03 and to extract a maximum of 3 rows per day, eg with this query:

SELECT * FROM mytable
WHERE sales_date BETWEEN '20211002' AND '20211003'
ORDER BY RAND () LIMIT 6

the expected output for the table above is:

item vietnamese cost unique_id sales_date
fruits trai cay 8 abc123 20211002
fruits trai cay 5 foo99 20211002
vege rau 8 rr1239 20211002
fruits trai cay 12 abc123 20211003
vege rau 1 ud97863 20211003
vege rau 9 r112359 20211003

But there is a possibility that all 6 rows expected comes from a single day:

item vietnamese cost unique_id sales_date
fruits trai cay 12 abc123 20211002
fruits trai cay 14 abc123 20211002
fruits trai cay 8 abc123 20211002
fruits trai cay 5 foo99 20211002
vege rau 8 rr1239 20211002
vege rau 1 rr1239 20211002

So to ensure that I have max 3 rows a day, I'm running multiple queries per day, ie

SELECT * FROM mytable
WHERE sales_date='20211002'
ORDER BY RAND () LIMIT 3

and

SELECT * FROM mytable
WHERE sales_date='20211003'
ORDER BY RAND () LIMIT 3

Is there a way to ensure N no. of max limit rows per day in a single query?

Otherwise is there a way to combine those one query per day into a "super-query"? If we're talking about a full year, it'll 365 queries, one per day.

Since 6 rows over 2 days means exactly 3 rows per day, let's expand it to a week.

In a subquery use row_number to assign a number to each row for each date. Then only select those with a row number of 3 or less.

select *
from (
  select
    *,
    row_number() over (partition by sales_date order by rand()) as row
  from mytable
  where sales_date between '20211002' and '20211009'
)
where row <= 3
order by rand()
limit 6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM