How to Calculate Full/Repeat Retention in BigQuery SQL

Question

I am trying to calculate a "rolling retention" or "repeat retention" (Not sure what the appropriate name for this is), but a scenario where I only want to count the proportion of users who place an order every single month consecutively.

So if 10 users place an order in Jan 2020, and 5 of them come back in Feb, that would equal a 50% retention.

Now for March, I only want to consider the 5 users who ordered in February, still taking note of the total January cohort size.

So if 2 users from February come back in March, retention for March will be 2/10 = 20%. If a user from Jan who didn't return in Feb, places an order in March, they will not be included in the calculation for March, because they did not return in February.

Basically, this retention will progressively decrease to 0% and can never increase.

Here is what I have done so far:

 WITH first_order AS (SELECT 
  customerEmail,
  MIN(orderedat) as firstOrder,
FROM fact AS fact
GROUP BY 1 ),

cohort_data AS (SELECT 
  first_order.customerEmail,
  orderedAt as order_month,
  MIN(FORMAT_DATE("%y-%m (%b)", date(firstorder))) as cohort_month,
FROM first_order as first_order
LEFT JOIN fact as fact
ON first_order.customeremail = fact.customeremail
GROUP BY 1,2, FACT.orderedAt),

cohort_count AS (select cohort_month, count(distinct customeremail) AS total_cohort_count FROM cohort_data GROUP BY 1 )

SELECT  
    cd.cohort_month,
    date_trunc(date(cd.order_month), month) as order_month,
    total_cohort_count,
    count(distinct cd.customeremail) as total_repeat
FROM cohort_data as cd
JOIN cohort_data as last_month
    ON cd.customeremail= last_month.customeremail
    and date(cd.order_month) = date_add(date(last_month.order_month), interval 1 month)
LEFT JOIN cohort_count AS cc 
    on cd.cohort_month = cc.cohort_month
GROUP BY 1,2,3
ORDER BY  cohort_month, order_month ASC

Here is the result. I'm not sure where I got it wrong but the numbers are too small and the retention increases in some months which shouldn't be.

I did an INNER JOIN in the last query so I could compare the previous month to the current month, but it didn't work exactly how I wanted.

Sample Data:

I'd appreciate any help

Answer 1

I would start with one row per customer per month. Then, I would enumerate the customer/months and keep only those with no gaps. . . and aggregate:

with customer_months as (
      select customer_email,
             date_trunc(ordered_at, month) as yyyymm,
             min(date_trunc(ordered_at, month)) over (partition by customer_email) as first_yyyymm
      from cohort_data
      group by 1, 2 
     )
select first_yyyymm, yyyymm, count(*)
from (select cm.*,
             row_number() over (partition by custoemr_email order by yyyymm) as seqnum
      from customer_months cm
     ) cm
where yyyymm = date_add(first_yyyymm, interval seqnum - 1 month)
group by 1, 2
order by 1, 2;

How to Calculate Full/Repeat Retention in BigQuery SQL

Question

1 answers

solution1
1 2021-03-15 11:42:22

How to Calculate Full/Repeat Retention in BigQuery SQL

Question

1 answers

solution1 1 2021-03-15 11:42:22

solution1
1 2021-03-15 11:42:22