[英]Calculate 7 Day Retention with SQL
Given the following tables,鉴于下表,
users page_views
+-----------------+-----------+ +----------+-----------+
| user_id |varchar| <----+ | pv_id | varchar |
| reg_ts |timestamp| | pv_ts | timestamp |
| reg_device |varchar| +----> | user_id | varchar |
| mktg_channel |varchar| | url | varchar |
+-----------------+-----------+ | device | varchar |
+----------+-----------+
What % of users who first visit on a given day came back again 1 week later?在某一天首次访问的用户中有多少百分比在 1 周后再次访问?
I'm currently using SQLlite and created a sample database but my output is off...我目前正在使用 SQLlite 并创建了一个示例数据库,但我的 output 已关闭...
Below is what I have so far:以下是我到目前为止的内容:
-- day 1 active users
SELECT *
FROM page_views
LEFT JOIN page_views AS future_page_views
ON page_views.user_id = future_page_views.user_id
AND page_views.pv_ts = future_page_views.pv_ts - datetime(future_page_views.pv_ts, '+7 day')
-- day 7 retained users
SELECT
future_page_views.pv_ts,
COUNT(DISTINCT page_views.user_id) as active_users,
COUNT(DISTINCT future_page_views.user_id) as retained_users,
CAST(COUNT(DISTINCT future_page_views.user_id) / COUNT(DISTINCT page_views.user_id) AS float) retention
FROM page_views
LEFT JOIN page_views as future_page_views
ON page_views.user_id = future_page_views.user_id
AND page_views.pv_ts = future_page_views.pv_ts - datetime(page_views.pv_ts, '+7 day')
GROUP BY 1
Not sure if I should use Strftime function (DATEDIFF) in this instance to capture the 7 day.不确定我是否应该在这种情况下使用 Strftime function (DATEDIFF) 来捕获 7 天。 Open to any suggestions and feedback, thanks in advance.接受任何建议和反馈,在此先感谢。
EDIT** Sample data below, based on the below data set,编辑**下面的示例数据,基于以下数据集,
I expect only user_id (8) to show up as 7 day retained (first day 2020-01-02) (last day 2020-01-09)我希望只有 user_id (8) 显示为保留 7 天(第一天 2020-01-02)(最后一天 2020-01-09)
Desired Output:所需的 Output:
You can look at just the first two page visits and then aggregate.您可以只查看前两个页面访问量,然后进行汇总。 This gives这给
select user_id, min(pv_ts) as first_ts,
nullif(max(pv_ts), min(pv_ts)) as second_ts
from (select pv.*,
row_number() over (partition by user_id order by pv_ts) as seqnum
from page_views pv
) pv
where seqnum <= 2
group by user_id;
Then to get the totals:然后得到总数:
select count(*),
sum(case when second_ts < datetime(first_ts, '+7day') then 1 else 0 end)
from (select user_id, min(pv_ts) as first_ts,
nullif(max(pv_ts), min(pv_ts)) as second_ts
from (select pv.*,
row_number() over (partition by user_id order by pv_ts) as seqnum
from page_views pv
) pv
where seqnum <= 2
group by user_id
) u;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.