简体   繁体   中英

User Subscription activity 1/0 table

I am trying to write a query that defines and then describes 'subscription activity' from a user transaction table. Let's say the transaction table we're working from has customer_id, created_at (transaction date), merchant_name, transaction_amount.

The final output table should have customer_Id as the first column, activity_month as the second column, merchant_name_x as the third column and _merchant_name_y as the 4th column (can have infinite columns from then on for various merchants). And here's the tricky part: the table should be populated with '1' if that cell represents a string of at least 3 consecutive months of that user transacting at least once with that merchant , and '0' otherwise.

Here is basic code that got me what would look like the output table with '1' for activity and '0' for no activity. It needs to be amended so that '1' appears only if its part of an unbroken string of >= 3 months of consecutive activity for the given merchant and '0' otherwise.

SELECT customer_id
    , LAST_DAY(created_at::DATE) AS month
    , MAX(CASE WHEN merchant_name = 'Amazon Prime' THEN 1 ELSE 0 END) AS amazon
    , MAX(CASE WHEN merchant_name = 'Netflix.com' THEN 1 ELSE 0 END) AS netflix
FROM TABLE
GROUP BY 1,2

The output would look something like this: '1' denotes that the user transacted with that merchant at least once (doesn't matter if once, twice, or 300 times that month) and that it is part of >= 3 consecutive months of that user transacting with that merchant

CUSTOMER_ID    MONTH       AMAZON   NETFLIX
54321          2019-04-30       1         0
54321          2019-03-31       1         0
54321          2019-02-29       1         1
54321          2019-01-31       1         1
54321          2018-12-31       0         1
54321          2018-11-30       0         0  

Working from a TABLE that lists and describes transactions:

ColumnsData                  Type
TRANSACTION_ID               NUMBER(38,0)
CREATED_AT                   TIMESTAMP_NTZ(9)
AMOUNT_DOLLARS   NUMBER(38,0)
CUSTOMER_ID                  NUMBER(38,0)
MERCHANT_NAME                VARCHAR(16777216)

(a sample of the data.table ):

TRANSACTION_ID     CREATED_AT     AMOUNT_DOLLARS   CUSTOMER_ID   MERCHANT_NAME
1234567            2018-08-23     57.31            306797979     Amazon Prime
7654321            2020-09-21     10.99            309221214     Stp & Shop
9999971            2020-01-07     11.59            509227711     Lyft Com
6549875            2019-05-10     88.23            311188226     Lttle Caesar
3121541            2020-07-31     72.01            307746845     Redbox
1279875            2020-04-05     15.20            315151515     Family Dollar

(timestamp left out from CREATED_AT for compactness) Query must scale across thousands of customers each making many transactions. Many thanks.

I think if I understand correctly you will need to use a windowed function like below:

UPDATE: updated to reflect sample data. I took the data you provided and added 3 additional records for Customer_ID = 306797979 and Merchant_name = Amazon. What you will see in the results is that the 2018-08 and 09 records are 0 because there are not 3 consecutive months of an amazon subscription at that point. When we hit 2018-10 and 11 they both now have 3 subsequent periods of transactions.

create temporary table temp (
TRANSACTION_ID NUMBER(38,0),
CREATED_AT TIMESTAMP_NTZ(9),
AMOUNT_DOLLARS NUMBER(38,0),
CUSTOMER_ID NUMBER(38,0),
MERCHANT_NAME VARCHAR(16777216)
);

insert into temp 
    values 
    (1234567, '2018-08-23 00:00:00', 57.31, 306797979, 'Amazon Prime'),
    (7654321, '2020-09-21 00:00:00', 10.99, 309221214, 'Stp & Shop'),
    (9999971, '2020-01-07 00:00:00', 11.59, 509227711, 'Lyft Com'),
    (6549875, '2019-05-10 00:00:00', 88.23, 311188226, 'Lttle Caesar'),
    (3121541, '2020-07-31 00:00:00', 72.01, 307746845, 'Redbox'),
    (1279875, '2020-04-05 00:00:00', 15.20, 315151515, 'Family Dollar'),
    (1234567, '2018-11-23 00:00:00', 57.31, 306797979, 'Amazon Prime'),
    (1234236, '2018-09-23 00:00:00', 57.31, 306797979, 'Amazon Prime'),
    (3972831, '2018-10-23 00:00:00', 57.31, 306797979, 'Amazon Prime');

with _filler_dates as (
    select
      date_trunc('MONTH',dateadd(
        month,
        '-' || row_number() over (order by null),
        dateadd(day, 1 , current_date()))) as filler_date
    from table (generator(rowcount => 30))
),
_data as(
select customer_id 
    , merchant_name
    , amount_dollars
    , date_trunc('MONTH', CREATED_AT) as cur_month
    , COALESCE(LEAD(cur_month,1,NULL) OVER (
        PARTITION BY customer_id
        ORDER BY cur_month)
        , date_trunc('MONTH', current_date())) as next_month   
from temp)
,_merged as (
select d.customer_id
    , d.merchant_name
    , CASE WHEN fd.filler_date <> cur_month then NULL else d.amount_dollars end as amount_dollars
    , fd.filler_date  
from _data d
join _filler_dates fd
 on fd.filler_date between  d.cur_month and dateadd(MONTH, -1, d.next_month)
)

select *
    , CASE WHEN sum(CASE WHEN amount_dollars > 0 THEN 1 ELSE 0 END)
                 OVER (PARTITION BY customer_id, merchant_name
                 ORDER BY filler_date ASC
                 ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) >= 3
           THEN 1 ELSE 0 END as amazon    
from _merged
order by 1,2,4;

//drop table temp;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM