[英]Get 30 days prior data for each row of query
I have a query where I have a list of ~ 20k users for a specific week of the month that represents that they have logged on to our site. 我有一个查询,在该查询中,我有一个〜2万名用户的列表,它们表示该月的特定一周内他们已经登录到我们的网站。
What I need to get - for each of these users, in the past 30 days if they have 1. logged on: defined by any rows recorded in the same table 2. max event in the 30 day window, prior to the date in the current where clause 我需要获得的-对于这些用户,如果他们已经登录,则在过去30天内每个月1.由同一表中记录的任何行定义2.在30天窗口中的最大事件,日期早于当前的where子句
This is the current code snippet that helps me narrow to the ~20k users for a given week to begin with: 这是当前的代码片段,可帮助我在给定的一周内缩小到约2万名用户,从以下几行开始:
select
user_id,
max(timestamp)
from table
where timestamp between '2019-02-01' and '2019-02-05'
group by 1,2;
Expected result set/columns: 预期结果集/列:
I think this is what you're looking for. 我认为这就是您要寻找的。 Not sure if it's the most efficient method though - perhaps windowing functions may perform better but like bob-mccormick mentioned: the tricky bit would be filling in dates where the user (partition key) was not active so that the range query will work correctly.
虽然不确定这是否是最有效的方法-也许开窗功能可能会更好,但是就像提到的bob-mccormick一样:棘手的地方是要填充用户(分区键)未处于活动状态的日期,以便范围查询能够正常工作。
Example data setup (Snowflake syntax) 数据设置示例(Snowflake语法)
-- Create sample table
create temporary table user_logins (userid number, date_logged_on timestamp);
;
-- Insert some random sample data
insert overwrite into user_logins
select
uniform(1,10,random()) userid,
dateadd('minutes', uniform(1,86400,random()) * -1,current_timestamp::timestamp_ntz) date_logged_on
from table(generator(rowcount => 100))
;
Select statement 选择声明
-- Run select
with user_last_logins as (
select
userid,
max(date_logged_on) last_login
from user_logins
where
date_logged_on between '2019-01-01' and '2019-05-08'
group by userid
)
select
user_last_logins.userid,
max(user_last_logins.last_login) last_logged_on,
count(prior_30_each_user.userid) num_logins_prior_30,
max(prior_30_each_user.date_logged_on)
from user_last_logins
left join user_logins prior_30_each_user
on user_last_logins.userid = prior_30_each_user.userid
and prior_30_each_user.date_logged_on > dateadd('day', -30, user_last_logins.last_login) and prior_30_each_user.date_logged_on < user_last_logins.last_login
group by user_last_logins.userid
;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.