[英]How to enforce a max limit no. of rows per day per date in SQL?
Given a data that looks as follows where the date is in string format YYYYMMDD
:给定如下所示的数据,其中日期采用字符串格式YYYYMMDD
:
item物品 | vietnamese越南语 | cost成本 | unique_id唯一身份 | sales_date销售日期 |
---|---|---|---|---|
fruits水果 | trai cay小岛 | 10 10 | abc123 abc123 | 20211001 20211001 |
fruits水果 | trai cay小岛 | 8 8个 | foo99富99 | 20211001 20211001 |
fruits水果 | trai cay小岛 | 9 9 | foo99富99 | 20211001 20211001 |
vege蔬菜 | rau劳 | 3 3个 | rr1239 rr1239 | 20211001 20211001 |
vege蔬菜 | rau劳 | 3 3个 | rr1239 rr1239 | 20211001 20211001 |
fruits水果 | trai cay小岛 | 12 12 | abc123 abc123 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 14 14 | abc123 abc123 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 8 8个 | abc123 abc123 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 5 5个 | foo99富99 | 20211002 20211002 |
vege蔬菜 | rau劳 | 8 8个 | rr1239 rr1239 | 20211002 20211002 |
vege蔬菜 | rau劳 | 1 1个 | rr1239 rr1239 | 20211002 20211002 |
vege蔬菜 | rau劳 | 12 12 | ud9213 ud9213 | 20211002 20211002 |
vege蔬菜 | rau劳 | 19 19 | r11759 r11759 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 6 6个 | foo99富99 | 20211003 20211003 |
fruits水果 | trai cay小岛 | 2 2个 | abc123 abc123 | 20211003 20211003 |
fruits水果 | trai cay小岛 | 12 12 | abc123 abc123 | 20211003 20211003 |
vege蔬菜 | rau劳 | 1 1个 | ud97863 ud97863 | 20211003 20211003 |
vege蔬菜 | rau劳 | 9 9 | r112359 r112359 | 20211003 20211003 |
fruits水果 | trai cay小岛 | 6 6个 | foo99富99 | 20211004 20211004 |
fruits水果 | trai cay小岛 | 2 2个 | abc123 abc123 | 20211004 20211004 |
fruits水果 | trai cay小岛 | 12 12 | abc123 abc123 | 20211004 20211004 |
vege蔬菜 | rau劳 | 9 9 | r112359 r112359 | 20211004 20211004 |
The goal is sample all the rows within a certain time frame, eg 2020-10-02 to 2020-10-03 and to extract a maximum of 3 rows per day, eg with this query:目标是在特定时间范围内对所有行进行采样,例如 2020-10-02 到 2020-10-03 并每天最多提取 3 行,例如使用此查询:
SELECT * FROM mytable
WHERE sales_date BETWEEN '20211002' AND '20211003'
ORDER BY RAND () LIMIT 6
the expected output for the table above is:上表的预期 output 是:
item物品 | vietnamese越南语 | cost成本 | unique_id唯一身份 | sales_date销售日期 |
---|---|---|---|---|
fruits水果 | trai cay小岛 | 8 8个 | abc123 abc123 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 5 5个 | foo99富99 | 20211002 20211002 |
vege蔬菜 | rau劳 | 8 8个 | rr1239 rr1239 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 12 12 | abc123 abc123 | 20211003 20211003 |
vege蔬菜 | rau劳 | 1 1个 | ud97863 ud97863 | 20211003 20211003 |
vege蔬菜 | rau劳 | 9 9 | r112359 r112359 | 20211003 20211003 |
But there is a possibility that all 6 rows expected comes from a single day:但是有可能所有 6 行预期都来自一天:
item物品 | vietnamese越南语 | cost成本 | unique_id唯一身份 | sales_date销售日期 |
---|---|---|---|---|
fruits水果 | trai cay小岛 | 12 12 | abc123 abc123 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 14 14 | abc123 abc123 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 8 8个 | abc123 abc123 | 20211002 20211002 |
fruits水果 | trai cay小岛 | 5 5个 | foo99富99 | 20211002 20211002 |
vege蔬菜 | rau劳 | 8 8个 | rr1239 rr1239 | 20211002 20211002 |
vege蔬菜 | rau劳 | 1 1个 | rr1239 rr1239 | 20211002 20211002 |
So to ensure that I have max 3 rows a day, I'm running multiple queries per day, ie所以为了确保我每天最多 3 行,我每天运行多个查询,即
SELECT * FROM mytable
WHERE sales_date='20211002'
ORDER BY RAND () LIMIT 3
and和
SELECT * FROM mytable
WHERE sales_date='20211003'
ORDER BY RAND () LIMIT 3
Is there a way to ensure N no.有没有办法确保 N 没有。 of max limit rows per day in a single query?单个查询中每天的最大限制行数?
Otherwise is there a way to combine those one query per day into a "super-query"?否则有没有办法将每天的一个查询组合成一个“超级查询”? If we're talking about a full year, it'll 365 queries, one per day.如果我们谈论一整年,它将有 365 个查询,每天一个。
Since 6 rows over 2 days means exactly 3 rows per day, let's expand it to a week.由于 2 天内 6 行意味着每天恰好 3 行,让我们将其扩展为一周。
In a subquery use row_number
to assign a number to each row for each date.在子查询中,使用row_number
为每个日期的每一行分配一个数字。 Then only select those with a row number of 3 or less.然后只有 select 行号为 3 或更少的那些。
select *
from (
select
*,
row_number() over (partition by sales_date order by rand()) as row
from mytable
where sales_date between '20211002' and '20211009'
)
where row <= 3
order by rand()
limit 6
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.