简体   繁体   English

如何执行最大限制号。 SQL 中每个日期每天的行数?

[英]How to enforce a max limit no. of rows per day per date in SQL?

Given a data that looks as follows where the date is in string format YYYYMMDD :给定如下所示的数据,其中日期采用字符串格式YYYYMMDD

item物品 vietnamese越南语 cost成本 unique_id唯一身份 sales_date销售日期
fruits水果 trai cay小岛 10 10 abc123 abc123 20211001 20211001
fruits水果 trai cay小岛 8 8个 foo99富99 20211001 20211001
fruits水果 trai cay小岛 9 9 foo99富99 20211001 20211001
vege蔬菜 rau 3 3个 rr1239 rr1239 20211001 20211001
vege蔬菜 rau 3 3个 rr1239 rr1239 20211001 20211001
fruits水果 trai cay小岛 12 12 abc123 abc123 20211002 20211002
fruits水果 trai cay小岛 14 14 abc123 abc123 20211002 20211002
fruits水果 trai cay小岛 8 8个 abc123 abc123 20211002 20211002
fruits水果 trai cay小岛 5 5个 foo99富99 20211002 20211002
vege蔬菜 rau 8 8个 rr1239 rr1239 20211002 20211002
vege蔬菜 rau 1 1个 rr1239 rr1239 20211002 20211002
vege蔬菜 rau 12 12 ud9213 ud9213 20211002 20211002
vege蔬菜 rau 19 19 r11759 r11759 20211002 20211002
fruits水果 trai cay小岛 6 6个 foo99富99 20211003 20211003
fruits水果 trai cay小岛 2 2个 abc123 abc123 20211003 20211003
fruits水果 trai cay小岛 12 12 abc123 abc123 20211003 20211003
vege蔬菜 rau 1 1个 ud97863 ud97863 20211003 20211003
vege蔬菜 rau 9 9 r112359 r112359 20211003 20211003
fruits水果 trai cay小岛 6 6个 foo99富99 20211004 20211004
fruits水果 trai cay小岛 2 2个 abc123 abc123 20211004 20211004
fruits水果 trai cay小岛 12 12 abc123 abc123 20211004 20211004
vege蔬菜 rau 9 9 r112359 r112359 20211004 20211004

The goal is sample all the rows within a certain time frame, eg 2020-10-02 to 2020-10-03 and to extract a maximum of 3 rows per day, eg with this query:目标是在特定时间范围内对所有行进行采样,例如 2020-10-02 到 2020-10-03 并每天最多提取 3 行,例如使用此查询:

SELECT * FROM mytable
WHERE sales_date BETWEEN '20211002' AND '20211003'
ORDER BY RAND () LIMIT 6

the expected output for the table above is:上表的预期 output 是:

item物品 vietnamese越南语 cost成本 unique_id唯一身份 sales_date销售日期
fruits水果 trai cay小岛 8 8个 abc123 abc123 20211002 20211002
fruits水果 trai cay小岛 5 5个 foo99富99 20211002 20211002
vege蔬菜 rau 8 8个 rr1239 rr1239 20211002 20211002
fruits水果 trai cay小岛 12 12 abc123 abc123 20211003 20211003
vege蔬菜 rau 1 1个 ud97863 ud97863 20211003 20211003
vege蔬菜 rau 9 9 r112359 r112359 20211003 20211003

But there is a possibility that all 6 rows expected comes from a single day:但是有可能所有 6 行预期都来自一天:

item物品 vietnamese越南语 cost成本 unique_id唯一身份 sales_date销售日期
fruits水果 trai cay小岛 12 12 abc123 abc123 20211002 20211002
fruits水果 trai cay小岛 14 14 abc123 abc123 20211002 20211002
fruits水果 trai cay小岛 8 8个 abc123 abc123 20211002 20211002
fruits水果 trai cay小岛 5 5个 foo99富99 20211002 20211002
vege蔬菜 rau 8 8个 rr1239 rr1239 20211002 20211002
vege蔬菜 rau 1 1个 rr1239 rr1239 20211002 20211002

So to ensure that I have max 3 rows a day, I'm running multiple queries per day, ie所以为了确保我每天最多 3 行,我每天运行多个查询,即

SELECT * FROM mytable
WHERE sales_date='20211002'
ORDER BY RAND () LIMIT 3

and

SELECT * FROM mytable
WHERE sales_date='20211003'
ORDER BY RAND () LIMIT 3

Is there a way to ensure N no.有没有办法确保 N 没有。 of max limit rows per day in a single query?单个查询中每天的最大限制行数?

Otherwise is there a way to combine those one query per day into a "super-query"?否则有没有办法将每天的一个查询组合成一个“超级查询”? If we're talking about a full year, it'll 365 queries, one per day.如果我们谈论一整年,它将有 365 个查询,每天一个。

Since 6 rows over 2 days means exactly 3 rows per day, let's expand it to a week.由于 2 天内 6 行意味着每天恰好 3 行,让我们将其扩展为一周。

In a subquery use row_number to assign a number to each row for each date.在子查询中,使用row_number为每个日期的每一行分配一个数字。 Then only select those with a row number of 3 or less.然后只有 select 行号为 3 或更少的那些。

select *
from (
  select
    *,
    row_number() over (partition by sales_date order by rand()) as row
  from mytable
  where sales_date between '20211002' and '20211009'
)
where row <= 3
order by rand()
limit 6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM