I have subscriptions data as shown below.
+------+-----------------+-----------+-----------+----------+--------+
| user | subscription_id | start | end | wk_start | wk_end |
+------+-----------------+-----------+-----------+----------+--------+
| 1 | 1A | 6/1/2019 | 6/30/2019 | 22 | 27 |
| 2 | 2A | 6/1/2019 | 6/21/2019 | 22 | 25 |
| 3 | 3A | 6/1/2019 | 6/21/2019 | 22 | 25 |
| 4 | 4A | 6/1/2019 | 6/15/2019 | 22 | 24 |
| | | | | | |
| 1 | 1B | 7/4/2019 | 8/4/2019 | 27 | 32 |
| 2 | 2B | 7/1/2019 | 7/31/2019 | 27 | 31 |
| 3 | 3B | 6/24/2019 | 7/24/2019 | 26 | 30 |
+------+-----------------+-----------+-----------+----------+--------+
The data shows when a user bought a subscription. It has user_id,subscription_id,start date and end_date
. I want to find out the user retention.
I want to see how many users that bought subscription for the first time in a particular week are active in the upcoming weeks.
They could be active on current subscription or new subscription bought after expiry of current subscription.
The desired output is as below
+----------+-------------+----------------+--+-----------------------------------------------------------------------------+
| start_wk | Rolling_wk | Retained Users | | Active User(Not a part of desired output) |
+----------+-------------+----------------+--+-----------------------------------------------------------------------------+
| 22 | 22 | 4 | | 1,2,3,4 |
| 22 | 23 | 4 | | 1,2,3,4 |
| 22 | 24 | 4 | | 1,2,3,4 |
| 22 | 25 | 3 | | 1,2,3 |
| 22 | 26 | 2 | | 1,3(with subscription_id = 3B) |
| 22 | 27 | 3 | | 1,2,3(1 is counted only once. He was active with subscription_id 1A and 1B) |
| 22 | 28 | 3 | | 1,2,3 |
| 22 | 29 | 3 | | 1,2,3 |
| 22 | 30 | 3 | | 1,2,3 |
+----------+-------------+----------------+--+-----------------------------------------------------------------------------+
Note that Active User
is not a part of desired output. It is only to understanding how the number in column Retained_User
is obtained.
I want columns start_wk
, Rolling_wk
and Retained Users
as output.
I will have a huge data like this for each week and want output for each week in similar fashion. In each case start_wk
will change and rolling_wk
will start from start_wk
+----------+------------+----------------+
| start_wk | rolling_wk | Retained_users |
+----------+------------+----------------+
| 22 | 22 | 100 |
| 22 | 23 | 80 |
| 22 | 24 | 50 |
| 22 | …… | …… |
| 22 | ……. | ……. |
| 23 | 23 | 150 |
| 23 | 24 | 120 |
| 23 | 25 | 110 |
| 23 | 26 | 94 |
| 23 | …… | …… |
| 23 | ……. | ……. |
| 23 | ……. | ……. |
| 24 | 24 | 78 |
| 24 | 25 | 56 |
| 24 | 26 | 43 |
| 24 | ……. | ……. |
| 24 | ……. | ……. |
+----------+------------+----------------+
Any help will be appreciated.
Your query should be somewhere around this below query to have sequence and count of subscribers who have multiple/single subscriptions and with max validity of their ongoing+upcoming plan < to current wk_start+rownum(in oracle as 22+1, 22+2... )/row_number() (in Sql server I guess)
Select wk_start, wk_start+rownum,
(Select count(*) from table where
(wk_start+rownum) <= All (Select
wk_max from
(SELECT user,
count(*) as
"no_of_subscriptions",
max(wk_end) as wk_max
from table
group by user) as Retained Users
from table;
I will make helper table weeks
which will have entries from 1 to 56 as column 'week', you can use loop as well. Basically weeks
table represents all possible week numbers.
select
w1.week, w2.week, count(s1.user) as Retained_Users
from
weeks w1, weeks2 w2, subscriptions s1
where
w1.week <= w2.week and
s1.wk_start <= ALL(
select s2.wk_start
from subscriptions s2
where s2.user = s1.user
)
and
( select true
from subscriptions s3
where s3.user = s1.user and
s3.wk_start <= w2.week and
w2.week <= s3.wk_end
limit 1)
group by w1.week, w2.week
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.