简体   繁体   English

使用 SQL 计算 7 天保留

[英]Calculate 7 Day Retention with SQL

Given the following tables,鉴于下表,

users                                             page_views
+-----------------+-----------+                  +----------+-----------+
| user_id                 |varchar| <----+       | pv_id     | varchar   |
| reg_ts                  |timestamp|            | pv_ts     | timestamp |
| reg_device              |varchar|       +----> | user_id   | varchar   |
| mktg_channel            |varchar|              | url       | varchar   |
+-----------------+-----------+                  | device    | varchar   |
                                                 +----------+-----------+
  • Table "users" has one row per registered user.表“用户”每个注册用户都有一行。
  • Table "page_views" has one row per page view event.表“page_views”每个页面查看事件有一行。

What % of users who first visit on a given day came back again 1 week later?在某一天首次访问的用户中有多少百分比在 1 周后再次访问?

I'm currently using SQLlite and created a sample database but my output is off...我目前正在使用 SQLlite 并创建了一个示例数据库,但我的 output 已关闭...

Below is what I have so far:以下是我到目前为止的内容:

-- day 1 active users
SELECT *
FROM page_views
LEFT JOIN page_views AS future_page_views 
ON page_views.user_id = future_page_views.user_id
AND page_views.pv_ts = future_page_views.pv_ts - datetime(future_page_views.pv_ts, '+7 day')

-- day 7 retained users
SELECT 
  future_page_views.pv_ts,
  COUNT(DISTINCT page_views.user_id) as active_users,
  COUNT(DISTINCT future_page_views.user_id) as retained_users,
  CAST(COUNT(DISTINCT future_page_views.user_id) / COUNT(DISTINCT page_views.user_id) AS float) retention
FROM page_views
LEFT JOIN page_views as future_page_views 
ON page_views.user_id = future_page_views.user_id
AND page_views.pv_ts = future_page_views.pv_ts - datetime(page_views.pv_ts, '+7 day')
GROUP BY 1

Not sure if I should use Strftime function (DATEDIFF) in this instance to capture the 7 day.不确定我是否应该在这种情况下使用 Strftime function (DATEDIFF) 来捕获 7 天。 Open to any suggestions and feedback, thanks in advance.接受任何建议和反馈,在此先感谢。

EDIT** Sample data below, based on the below data set,编辑**下面的示例数据,基于以下数据集,

I expect only user_id (8) to show up as 7 day retained (first day 2020-01-02) (last day 2020-01-09)我希望只有 user_id (8) 显示为保留 7 天(第一天 2020-01-02)(最后一天 2020-01-09)![用户 Page_Views

Desired Output:所需的 Output:

  • User_ID用户身份
  • p.pv_ts as First_Day p.pv_ts 作为 First_Day
  • f.pv_ts as Last_Day f.pv_ts 作为 Last_Day
  • Retention Days (ie 1,2,3,4,5 days...)保留天数(即 1、2、3、4、5 天...)
  • % of users who visited and came back on day 7在第 7 天访问并返回的用户百分比

You can look at just the first two page visits and then aggregate.您可以只查看前两个页面访问量,然后进行汇总。 This gives这给

select user_id, min(pv_ts) as first_ts,
       nullif(max(pv_ts), min(pv_ts)) as second_ts
from (select pv.*,
             row_number() over (partition by user_id order by pv_ts) as seqnum
      from page_views pv
     ) pv
where seqnum <= 2
group by user_id;

Then to get the totals:然后得到总数:

select count(*),
       sum(case when second_ts < datetime(first_ts, '+7day') then 1 else 0 end)
from (select user_id, min(pv_ts) as first_ts,
             nullif(max(pv_ts), min(pv_ts)) as second_ts
      from (select pv.*,
                   row_number() over (partition by user_id order by pv_ts) as seqnum
            from page_views pv
           ) pv
      where seqnum <= 2
      group by user_id
     ) u;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM