簡體   English   中英

如何計算 SQL 中的存活率?

[英]How do I calculate Survival Rate in SQL?

(方言可以是VerticaImpalaDatabricks

我正在嘗試計算用戶的第 0 天、第 1 天……直到第 7 天的存活率。 我將某個日期的所有用戶視為 d0(無論他們是新用戶還是舊用戶),並查看其中有多少用戶在 d1、d2 等時間返回。假設我們有以下數據:

user | login_date
-----------------
001  | 2019-11-01
002  | 2019-11-01
003  | 2019-11-01
004  | 2019-11-01
005  | 2019-11-01
001  | 2019-11-02
003  | 2019-11-02
004  | 2019-11-02
006  | 2019-11-02
007  | 2019-11-02
002  | 2019-11-03
003  | 2019-11-03
004  | 2019-11-03
005  | 2019-11-03
008  | 2019-11-03
001  | 2019-11-04
002  | 2019-11-04
006  | 2019-11-04
007  | 2019-11-04
009  | 2019-11-04

我想看到這樣的東西:

date      |d0 |d1 |d2 |d3
--------------------------
2019-11-01| 5 | 3 | 4 | 2
2019-11-02| 5 | 2 | 3 | 
2019-11-03| 5 | 1
2019-11-04| 5

所以你可以看到 d0 是 5(即使有些用戶之前登錄過),例如我們在 2019-11-02 有001003004006007 ,其中 2 個在第二天回來了。

現在我開發了一個接近我的目標的查詢,但不一樣。

WITH cte1 AS (
    SELECT
        user, 
        login_date,
        FIRST_VALUE(login_date) OVER (PARTITION BY user ORDER BY login_date) AS first_login_day,
        DATEDIFF(login_date, first_login_day) AS days_since_first_play
    FROM
        table
)
SELECT
    first_login_day,
    SUM(CASE WHEN days_since_first_play = 0 THEN 1 ELSE 0 END) AS d0,
    SUM(CASE WHEN days_since_first_play = 1 THEN 1 ELSE 0 END) AS d1,
    SUM(CASE WHEN days_since_first_play = 2 THEN 1 ELSE 0 END) AS d2,
    SUM(CASE WHEN days_since_first_play = 3 THEN 1 ELSE 0 END) AS d3,
    SUM(CASE WHEN days_since_first_play = 4 THEN 1 ELSE 0 END) AS d4,
    SUM(CASE WHEN days_since_first_play = 5 THEN 1 ELSE 0 END) AS d5,
    SUM(CASE WHEN days_since_first_play = 6 THEN 1 ELSE 0 END) AS d6,
    SUM(CASE WHEN days_since_first_play = 7 THEN 1 ELSE 0 END) AS d7
FROM
    cte1
GROUP BY
    first_login_day
ORDER BY
    first_login_day

查詢的問題在於它從我正在查看的日期中刪除了舊玩家 例如,使用相同的數據,因為001003004已經在 2019-11-01 登錄,所以 2019-11-02 的d0值將是 2 而不是 5。所以這個查詢只有在我正在尋找時才有效僅限新用戶。

我想知道是否可以更改查詢以實現我想要的? 先謝謝了~~

這是一種公認的丑陋方式。 這個想法是標記每個user_id,如果他們是第1天,第2天,等等的返回者,然后通過login_date聚合。 希望看到一種更好的方法來做到這一點。

with offsets as (
select a.user_id
    , a.login_date
    , case when b.login_date is not null then 1 else 0 end day_plus_one
    , case when c.login_date is not null then 1 else 0 end day_plus_two
    , case when d.login_date is not null then 1 else 0 end day_plus_three
from table a
    left join table b
        on b.user_id = a.user_id
        and b.login_date  = a.login_date+1
    left join table c
        on c.user_id = a.user_id
        and c.login_date  = a.login_date+2
    left join table d
        on d.user_id = a.user_id
        and d.login_date  = a.login_date+3
order by a.user_id, a.login_date
)
select 
    login_date
    , count(distinct user_id) day_zero_logins
    , sum(day_plus_one) day_one_logins
    , sum(day_plus_two) day_two_logins
    , sum(day_plus_three) day_three_logins
from offsets
group by login_date
order by login_date

驗證它適用於 OP 樣本數據

一些 self-left join 和不同的用戶計數會給出這樣的結果。

SELECT t0.login_date,
COUNT(distinct t0.user) as d0,
COUNT(distinct t1.user) as d1,
COUNT(distinct t2.user) as d2,
COUNT(distinct t3.user) as d3
FROM table t0
LEFT JOIN table t1 
  ON t1.user = t0.user
 AND t1.login_date = t0.login_date + 1
LEFT JOIN table t2 
  ON t2.user = t0.user
 AND t2.login_date = t0.login_date + 2
LEFT JOIN table t3 
  ON t3.user = t0.user
 AND t3.login_date = t0.login_date + 3
GROUP BY t0.login_date
ORDER BY t0.login_date

但是如果login_date需要連接呢?
然后只需將 JOIN 標准更改為:

FROM table t0
LEFT JOIN table t1 
  ON t1.user = t0.user
 AND t1.login_date = t0.login_date + 1
LEFT JOIN table t2 
  ON t2.user = t1.user
 AND t2.login_date = t1.login_date + 1
LEFT JOIN table t3 
  ON t3.user = t2.user
 AND t3.login_date = t2.login_date + 1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM