I am trying to write a query for monthly retention, to calculate percentage of users returning from their initial start month and moving forward.
TABLE: customer_order
fields
id
date
store_id
TABLE: customer
id
person_id
job_id
first_time (bool)
This gets me the initial monthly cohorts based on the first dates
SELECT first_job_month, COUNT( DISTINCT person_id) user_counts
FROM
( SELECT DATE_TRUNC(MIN(CAST(date AS DATE)), month) first_job_month, person_id
FROM customer_order cd
INNER JOIN consumer co ON co.job_id = cd.id
GROUP BY 2
ORDER BY 1 ) first_d GROUP BY 1 ORDER BY 1
first_job_month user_counts
2018-04-01 36
2018-05-01 37
2018-06-01 39
2018-07-01 45
2018-08-01 38
I have tried a bunch of things, but I can't figure out how to keep track of the original cohorts/users from the first month onwards
There are some alternative options like using window functions to do (1) and (2) in the same subquery but the easiest option is this one:
WITH
cohorts as (
SELECT person_id, DATE_TRUNC(MIN(CAST(date AS DATE)), month) as first_job_month
FROM customer_order cd
JOIN consumer co
ON co.job_id = cd.id
GROUP BY 1
)
,orders as (
SELECT
*
,round(1.0*(DATE_TRUNC(MIN(CAST(cd.date AS DATE))-c.first_job_month)/30) as months_since_first_order
FROM cohorts c
JOIN customer_order cd
USING (person_id)
)
SELECT
first_job_month as cohort
,count(distinct person_id) as size
,count(distinct case when months_since_first_order>=1 then person_id end) as m1
,count(distinct case when months_since_first_order>=2 then person_id end) as m2
,count(distinct case when months_since_first_order>=3 then person_id end) as m3
-- hardcode up to the number of months you want and the history you have
FROM orders
GROUP BY 1
ORDER BY 1
See, you can use CASE
statements inside the aggregate functions like COUNT
to identify different subsets of rows that you'd like to aggregate within the same group. This is one of the most important BI techniques in SQL.
Note, >=
not =
is used in the conditional aggregate so that for example if the customer buys in m3
after m1
and doesn't buy in m2
they will still be counted in m2
. If you want your customers to buy every month and/or see the actual retention for every month and are ok if subsequent months values can be higher than previous you can use =
.
Also, if you don't want the "triangle" view like one you get from this query or you don't want to hardcode the "mX" part you would just group by first_job_month
and months_since_first_order
and count distinct. Some visualization tools might consume this simple format and make a triangle view out of it.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.