简体   繁体   中英

How to calculate customer retention in SQL based on events?

I am trying to create a SQL Statement to find out which of the customer has NOT attended three events in a row

Table 1 - Customer: Customer ID, Customer Name

+-------------+---------------+
| Customer ID | Customer Name |
+-------------+---------------+
|          01 | Customer 01   |
|          02 | Customer 02   |
|          03 | Customer 03   |
+-------------+---------------+

Table 2 - Events Event ID, Event Date, Event Name

+----------------------------------+
| Event ID  Event Date  Event Name |
+----------------------------------+
| 01        01/01/2020  Event 01   |
| 02        01/15/2020  Event 02   |
| 03        02/15/2020  Event 03   |
| 04        03/13/2020  Event 04   |
| 05        05/17/2020  Event 05   |
| 06        06/20/2020  Event 06   |
+----------------------------------+

Table 3 - Event Activity Event ID, Customer ID

+----------+-------------+----+
| Event ID | Customer ID |    |
+----------+-------------+----+
|       01 |             | 01 |
|       01 |             | 02 |
|       01 |             | 03 |
|       02 |             | 01 |
|       03 |             | 01 |
|       03 |             | 02 |
|       04 |             | 01 |
|       05 |             | 01 |
|       06 |             | 01 |
|       06 |             | 03 |
+----------+-------------+----+

Now I am trying to find those customers that did not attend 3 events in a row.

So in the given example that would be Customer 2 and Customer 3.

I used the suggestion from Steve. here the updated SQL Statements:

drop table if exists dbo.customer;
create table dbo.customer(
  CustID        int not null,
  CustName      varchar(20) not null);
insert dbo.customer(CustID, CustName) values
(1,'Cust 1'),
(2,'Cust 2'),
(3,'Cust 3'),
(4,'Cust 4'),
(5,'Cust 5')
;


drop table if exists dbo.events;
create table dbo.events(
  EventID       int not null,
  EventDate     date not null,
  EventName     varchar(20) not null);
insert dbo.events(EventId, EventDate, EventName) values
(1,'2020-01-01','Event 1'),
(2,'2020-01-15','Event 2'),
(3,'2020-02-15','Event 3'),
(4,'2020-03-13','Event 4'),
(5,'2020-05-17','Event 5'),
(6,'2020-06-20','Event 6');


drop table if exists dbo.eventactivity;
create table dbo.eventactivity(
  EventID       int not null,
  CustID        int not null);
insert dbo.eventactivity(EventID, CustID) values
(1,1),
(1,2),
(1,3),
(1,4),
(1,5),
(2,1),
(2,2),
(2,4),
(2,5),
(3,1),
(3,5),
(4,1),
(4,5),
(5,1),
(5,2),
(5,3),
(5,5),
(6,1),
(6,2),
(6,3);
(6,5);

and here:

;with
events_sorted as (
    select e.*, row_number() over (order by EventDate) seq from dbo.events e),
activity_lag as 
(
    select
      a.*, e.seq,
      lag(e.seq, 1, 0) over (partition by CustId order by e.seq) lag_seq,
      iif(lag(e.seq, 1, 0) over (partition by CustId order by e.seq)=0, 1, 
          iif((e.seq-lag(e.seq, 1, 0) over (partition by CustId order by e.seq))>1, 1, 0)) seq_break
    from dbo.eventactivity a
         join events_sorted e on a.EventID=e.EventID
),
activity_lag_sum as (
    select
      alag.*, sum(seq_break) over (partition by CustId order by alag.seq) seq_grp
    from
      activity_lag alag
),
three_in_a_row_cte as (
    select distinct CustId
    from activity_lag_sum
    group by CustID, seq_grp
    having count(*)>=3
    )
    select * 
from customer c
where not exists(select 1
                 from three_in_a_row_cte r
                 where c.CustID=r.CustID);

The problem is, that this returns customer 2, customer 3, customer 4 - and customer 2 did attend 2 events, skipped 2, attended 2, so customer 2 shouldn't be on the list.

any suggestions ?

The following query returns CustId's which have: 1) skipped 3 or more events, or 2) have attended less than 3 events in total.

;with
events_sorted as (
    select e.*, row_number() over (order by EventDate) seq from #events e),
activity_lag as 
(
    select
      a.*, e.seq,
      lag(e.seq, 1, 0) over (partition by CustId order by e.seq) lag_seq,
      iif(lag(e.seq, 1, 0) over (partition by CustId order by e.seq)=0, 1, 
          iif((e.seq-lag(e.seq, 1, 0) over (partition by CustId order by e.seq))>1, 1, 0)) seq_break
    from #eventactivity a
         join events_sorted e on a.EventID=e.EventID
)
select distinct CustId
from activity_lag
where seq-lag_seq>3
union all
select CustId
from activity_lag
group by CustId
having count(*)<3;

Results

CustId
3
4

You just need the customers who have skipped 3 or more events in a row and you can get that by querying from event activity table itself. Please find the query and query results below :-

creating the table

     create table event_activity ("event_id" varchar(2),"customer_id" varchar(2))
     insert into event_activity
     values ('01','01'),('01','02'),('01','03'),('02','01'),('02','03'),('03','01'),
     ('03','02'),('04','01'),('04','02'),('05','02'),('06','01'),('06','03'), 
     ('07','03'),('08','04'),('12','04'),('13','05')

Above query would result in following table:-

  event_id | customerid 
  ---------------------    
      01   |   01
      01   |   02
      01   |   03
      02   |   01
      02   |   03
      03   |   01
      03   |   02
      04   |   01
      04   |   02
      05   |   02
      06   |   01
      06   |   03
      07   |   03
      08   |   04
      12   |   04
      13   |   05
      

from above table we can observe all customers except customer 4 and 5 have skipped events less than 3 in a row. As per your question, we only need 4 and 5 because 4 has skipped 3 events in a row but 5 has attended only 1 event.

PS : - Here you can find customer 3 has also skipped 3 events but before that he has attended some events by not skipping any so, it has to be eliminated.

Final Query

    select c.customer_id
    from
    (
       select customer_id, 
              skipped_count, 
              lag(skipped_count,1) over (partition by customer_id order by event_id) 
              as ref
       from 
          ( 
             select customer_id, 
                    event_id,
                    LAG(event_id,1) over (partition by customer_id order by event_id) 
                    as previous_event,(event_id - LAG(event_id,1) over (partition by 
                    customer_id order by event_id)-1) as skipped_count
              from 
                 (
                   select CONVERT(int,event_id) as event_id, 
                          CONVERT(int, customer_id) as customer_id 
                          from event_activity
                 )a
            )b
     )c
     join
     (
         select convert(int,customer_id) as customer_id,
                count(event_id) as count_event
           from event_activity
           group by customer_id
     )d
     on c.customer_id=d.customer_id
     where (skipped_count >=3 and ref is null)
     or count_event = 1
     or (skipped_count >=3 and ref > 2)

Output

    4
    5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM