简体   繁体   中英

User retention at a week level

I have subscriptions data as shown below.

+------+-----------------+-----------+-----------+----------+--------+
| user | subscription_id |   start   |    end    | wk_start | wk_end |
+------+-----------------+-----------+-----------+----------+--------+
|    1 | 1A              | 6/1/2019  | 6/30/2019 |       22 |     27 |
|    2 | 2A              | 6/1/2019  | 6/21/2019 |       22 |     25 |
|    3 | 3A              | 6/1/2019  | 6/21/2019 |       22 |     25 |
|    4 | 4A              | 6/1/2019  | 6/15/2019 |       22 |     24 |
|      |                 |           |           |          |        |
|    1 | 1B              | 7/4/2019  | 8/4/2019  |       27 |     32 |
|    2 | 2B              | 7/1/2019  | 7/31/2019 |       27 |     31 |
|    3 | 3B              | 6/24/2019 | 7/24/2019 |       26 |     30 |
+------+-----------------+-----------+-----------+----------+--------+

The data shows when a user bought a subscription. It has user_id,subscription_id,start date and end_date . I want to find out the user retention.

I want to see how many users that bought subscription for the first time in a particular week are active in the upcoming weeks.

They could be active on current subscription or new subscription bought after expiry of current subscription.

The desired output is as below

+----------+-------------+----------------+--+-----------------------------------------------------------------------------+
| start_wk | Rolling_wk  | Retained Users |  |                  Active User(Not a part of desired output)                  |
+----------+-------------+----------------+--+-----------------------------------------------------------------------------+
|       22 |          22 |              4 |  | 1,2,3,4                                                                     |
|       22 |          23 |              4 |  | 1,2,3,4                                                                     |
|       22 |          24 |              4 |  | 1,2,3,4                                                                     |
|       22 |          25 |              3 |  | 1,2,3                                                                       |
|       22 |          26 |              2 |  | 1,3(with subscription_id = 3B)                                              |
|       22 |          27 |              3 |  | 1,2,3(1 is counted only once. He was active with subscription_id 1A and 1B) |
|       22 |          28 |              3 |  | 1,2,3                                                                       |
|       22 |          29 |              3 |  | 1,2,3                                                                       |
|       22 |          30 |              3 |  | 1,2,3                                                                       |
+----------+-------------+----------------+--+-----------------------------------------------------------------------------+

Note that Active User is not a part of desired output. It is only to understanding how the number in column Retained_User is obtained.

I want columns start_wk , Rolling_wk and Retained Users as output.

I will have a huge data like this for each week and want output for each week in similar fashion. In each case start_wk will change and rolling_wk will start from start_wk

+----------+------------+----------------+
| start_wk | rolling_wk | Retained_users |
+----------+------------+----------------+
|       22 | 22         | 100            |
|       22 | 23         | 80             |
|       22 | 24         | 50             |
|       22 | ……         | ……             |
|       22 | …….        | …….            |
|       23 | 23         | 150            |
|       23 | 24         | 120            |
|       23 | 25         | 110            |
|       23 | 26         | 94             |
|       23 | ……         | ……             |
|       23 | …….        | …….            |
|       23 | …….        | …….            |
|       24 | 24         | 78             |
|       24 | 25         | 56             |
|       24 | 26         | 43             |
|       24 | …….        | …….            |
|       24 | …….        | …….            |
+----------+------------+----------------+

Any help will be appreciated.

Your query should be somewhere around this below query to have sequence and count of subscribers who have multiple/single subscriptions and with max validity of their ongoing+upcoming plan < to current wk_start+rownum(in oracle as 22+1, 22+2... )/row_number() (in Sql server I guess)

         Select wk_start, wk_start+rownum,
         (Select count(*) from table where      
         (wk_start+rownum)  <= All (Select 
           wk_max  from 
          (SELECT user, 
           count(*) as 
          "no_of_subscriptions", 
         max(wk_end) as wk_max
           from table 
             group by user) as Retained Users
           from table;

I will make helper table weeks which will have entries from 1 to 56 as column 'week', you can use loop as well. Basically weeks table represents all possible week numbers.

select 
    w1.week, w2.week, count(s1.user) as Retained_Users
from 
    weeks w1, weeks2 w2, subscriptions s1 
where   
    w1.week <= w2.week and
    s1.wk_start <= ALL(
        select s2.wk_start 
        from subscriptions s2 
        where s2.user = s1.user 
    ) 
    and
    (   select true
        from subscriptions s3
        where s3.user = s1.user and
            s3.wk_start <= w2.week and 
            w2.week <= s3.wk_end
        limit 1)
group by w1.week, w2.week

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM