SAS start and end date from consecutive run

Question

I have a dataset of customers buying items in multiple batches of consecutive days over the year eg Customer A buys on the 1st of January, the 2nd of January and the 3rd of January, stops, then buys again on the 1st of February, the 2nd of February and the 3rd of February.

I'm looking to capture the first and last date of each consecutive batch for each customer (so the usual MIN / MAX will miss out of batches in between dates).

I've experimented with RETAIN and LAG and I'm getting close but its not quite what I want.

How do I create a query that will display two rows for Customer A? ie row 1 showing start date of the 1st of January and end date of the 3rd of January; row 2 showing start date of the 1st of February and end date of the 3rd of February.

Answer 1

You are asking to group the values based on the presence of a gap between the dates. So test for that and create a new group number variable. Then you can use that new grouping variable in your analysis.

data want ;
  set have ;
  by id date;
  dif_days = dif(sales_date);
  if first.id then group=1;
  else if dif_days > 1 then group+1;
run;

You can adjust the number of days in the last IF statement to adjust how large of a gap you want to allow and still consider the events as part of the same group.

SAS start and end date from consecutive run

Question

1 answers

solution1
2 ACCPTED 2019-06-30 14:55:21

SAS start and end date from consecutive run

Question

1 answers

solution1 2 ACCPTED 2019-06-30 14:55:21

solution1
2 ACCPTED 2019-06-30 14:55:21