简体   繁体   English

为每行查询获取30天之前的数据

[英]Get 30 days prior data for each row of query

I have a query where I have a list of ~ 20k users for a specific week of the month that represents that they have logged on to our site. 我有一个查询,在该查询中,我有一个〜2万名用户的列表,它们表示该月的特定一周内他们已经登录到我们的网站。

What I need to get - for each of these users, in the past 30 days if they have 1. logged on: defined by any rows recorded in the same table 2. max event in the 30 day window, prior to the date in the current where clause 我需要获得的-对于这些用户,如果他们已经登录,则在过去30天内每个月1.由同一表中记录的任何行定义2.在30天窗口中的最大事件,日期早于当前的where子句

This is the current code snippet that helps me narrow to the ~20k users for a given week to begin with: 这是当前的代码片段,可帮助我在给定的一周内缩小到约2万名用户,从以下几行开始:

select
   user_id,
   max(timestamp)
from table 
   where timestamp between '2019-02-01' and '2019-02-05'
group by 1,2;

Expected result set/columns: 预期结果集/列:

  1. user_id, 用户身份,
  2. max(timestamp), max(时间戳),
  3. logged_on, [if they have any # of rows in the same table within 30 days prior to their max(timestamp) date] logging_on,[如果在它们的max(timestamp)日期之前的30天内它们在同一表中有任何行数]
  4. previous_timestamp, [the 2nd most recent login date within 30 days prior to their max(timestamp) date] previous_timestamp,[最晚登录日期前30天内的第二个最近登录日期]

I think this is what you're looking for. 我认为这就是您要寻找的。 Not sure if it's the most efficient method though - perhaps windowing functions may perform better but like bob-mccormick mentioned: the tricky bit would be filling in dates where the user (partition key) was not active so that the range query will work correctly. 虽然不确定这是否是最有效的方法-也许开窗功能可能会更好,但是就像提到的bob-mccormick一样:棘手的地方是要填充用户(分区键)未处于活动状态的日期,以便范围查询能够正常工作。

Example data setup (Snowflake syntax) 数据设置示例(Snowflake语法)

-- Create sample table
create temporary table user_logins (userid number, date_logged_on timestamp);
;

-- Insert some random sample data
insert overwrite into user_logins 
select 
    uniform(1,10,random()) userid, 
    dateadd('minutes', uniform(1,86400,random()) * -1,current_timestamp::timestamp_ntz) date_logged_on 
from table(generator(rowcount => 100))
;

Select statement 选择声明

-- Run select
with user_last_logins as (
  select 
    userid,
    max(date_logged_on) last_login
  from user_logins
  where
    date_logged_on between '2019-01-01' and '2019-05-08'
  group by userid
)
select 
    user_last_logins.userid,
    max(user_last_logins.last_login) last_logged_on,
    count(prior_30_each_user.userid) num_logins_prior_30,
    max(prior_30_each_user.date_logged_on)
from user_last_logins
left join user_logins prior_30_each_user
    on user_last_logins.userid = prior_30_each_user.userid
    and prior_30_each_user.date_logged_on > dateadd('day', -30, user_last_logins.last_login) and prior_30_each_user.date_logged_on < user_last_logins.last_login
group by  user_last_logins.userid
;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 让用户在日期前 30 天(含)注册 - Get users registered 30 days prior to date (inclusive) 用于为每个唯一 ID 选择最大日期和前 30 天之间的数据点的 Hive 查询 - Hive query for selecting data points in between max date and previous 30 days for each unique id 获取Where中每个元素的前一行 - Get prior row for each element in Where in SQL会在过去5天内为每个用户抓取所有行,并在5天内为每个用户抓取第一行 - SQL grab all rows within past 5 days for each user and the first row for each user prior to that 5 days SQL 查询从一个小于 30 天的表中获取所有数据 - SQL query to get all data from a table that is less than 30 days old SQL查询:每天连续30天访问该网站 - SQL Query: Visited the site each day for 30 consecutive days 获取过去 30 天内每天的查询结果 - Getting the results of a query for each day in the past 30 days 获取一个查询,该查询可以基于表数据而非当前日期显示30天 - Get a Query that can display of 30 days based on table data and not current date 从现在到 30 天前的几天分别获取 SQL 数据 - Get SQL data individually for days between now and 30 days ago PHP + Mysql - 获取过去 30 天中每一天的统计数据 - PHP + Mysql - Get statistics for each day of the last 30 days
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM