简体   繁体   中英

SQL Retention Cohort Analysis

I am trying to write a query for monthly retention, to calculate percentage of users returning from their initial start month and moving forward.

TABLE: customer_order
fields
id
date
store_id

TABLE: customer
id
person_id
job_id
first_time (bool)

This gets me the initial monthly cohorts based on the first dates

SELECT first_job_month, COUNT( DISTINCT person_id) user_counts
FROM 
   ( SELECT DATE_TRUNC(MIN(CAST(date AS DATE)), month) first_job_month, person_id
FROM customer_order cd
INNER JOIN consumer co ON co.job_id = cd.id
GROUP BY 2
ORDER BY 1 ) first_d GROUP BY 1 ORDER BY 1

first_job_month   user_counts
2018-04-01        36

2018-05-01        37

2018-06-01        39

2018-07-01        45

2018-08-01        38

I have tried a bunch of things, but I can't figure out how to keep track of the original cohorts/users from the first month onwards

  1. Get your the first order month for every customer
  2. Join orders to the previous subquery to find out what is the difference in months between the given order and the first order
  3. Use conditional aggregates to count customers that still order by X month

There are some alternative options like using window functions to do (1) and (2) in the same subquery but the easiest option is this one:

WITH
cohorts as (
    SELECT person_id, DATE_TRUNC(MIN(CAST(date AS DATE)), month) as first_job_month
    FROM customer_order cd
    JOIN consumer co 
    ON co.job_id = cd.id
    GROUP BY 1
)
,orders as (
    SELECT
     *
    ,round(1.0*(DATE_TRUNC(MIN(CAST(cd.date AS DATE))-c.first_job_month)/30) as months_since_first_order
    FROM cohorts c
    JOIN customer_order cd
    USING (person_id)
)
SELECT
 first_job_month as cohort
,count(distinct person_id) as size
,count(distinct case when months_since_first_order>=1 then person_id end) as m1
,count(distinct case when months_since_first_order>=2 then person_id end) as m2
,count(distinct case when months_since_first_order>=3 then person_id end) as m3
-- hardcode up to the number of months you want and the history you have
FROM orders 
GROUP BY 1
ORDER BY 1

See, you can use CASE statements inside the aggregate functions like COUNT to identify different subsets of rows that you'd like to aggregate within the same group. This is one of the most important BI techniques in SQL.

Note, >= not = is used in the conditional aggregate so that for example if the customer buys in m3 after m1 and doesn't buy in m2 they will still be counted in m2 . If you want your customers to buy every month and/or see the actual retention for every month and are ok if subsequent months values can be higher than previous you can use = .

Also, if you don't want the "triangle" view like one you get from this query or you don't want to hardcode the "mX" part you would just group by first_job_month and months_since_first_order and count distinct. Some visualization tools might consume this simple format and make a triangle view out of it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM