简体   繁体   中英

how to get the 3rd report to combine the customer and order data

I have a question about retention rate.
I have 2 tables, including the customer data and the order data.

DISTRIBUTOR as d
+---------+-----------+--------------+--------------------+
|   ID    | SETUP_DT  | REINSTATE_DT | LOCAL_REINSTATE_DT |
+---------+-----------+--------------+--------------------+
| C111111 | 2018/1/1  | Null         | Null               |
| C111112 | 2015/12/9 | 2018/10/25   | 2018/10/25         |
| C111113 | 2018/10/1 | Null         | Null               |
| C111114 | 2018/10/6 | 2018/12/14   | 2018/12/14         |
+---------+-----------+--------------+--------------------+
ORDER as o, please noted that the data is for reference...
+---------+----------+-----+
|   ID    |  ORD_DT  | OAL |
+---------+----------+-----+
| C111111 | 2018/1/1 | 112 |
| C111111 | 2018/1/1 | 100 |
| C111111 | 2018/1/1 | 472 |
| C111111 | 2018/1/1 | 452 |
| C111111 | 2018/1/1 | 248 |
| C111111 | 2018/1/1 | 996 |
+---------+----------+-----+
The 3rd Table in my mind to create the retention rate report
+---------+-----------+-----------+---------------+-----------+
|   ID    |  APP_MON  | ORDER_MON | TimeDiff(Mon) | TTL AMT |
+---------+-----------+-----------+---------------+-----------+
| C111111 | 2018/1/1  | 2018/1/1  |             - |  25,443   |
| C111111 | 2018/1/1  | 2018/2/1  |             1 |  7,610    |
| C111111 | 2018/1/1  | 2018/3/1  |             2 |  20,180   |
| C111111 | 2018/1/1  | 2018/4/1  |             3 |  22,265   |
| C111111 | 2018/1/1  | 2018/5/1  |             4 |  34,118   |
| C111111 | 2018/1/1  | 2018/6/1  |             5 |  19,523   |
| C111111 | 2018/1/1  | 2018/7/1  |             6 |  20,220   |
| C111111 | 2018/1/1  | 2018/8/1  |             7 |  2,006    |
| C111111 | 2018/1/1  | 2018/9/1  |             8 |  15,813   |
| C111111 | 2018/1/1  | 2018/10/1 |             9 |  16,733   |
| C111111 | 2018/1/1  | 2018/11/1 |            10 |  20,973   |
| C111112 | 2018/10/1 | 2017/11/1 |             - |  516      |
| C111112 | 2018/10/1 | 2018/10/1 |             - |  1        |
| C111113 | 2018/10/1 | Null      |             - | Null      |
| C111114 | 2018/12/1 | Null      |             - | Null      |
+---------+-----------+-----------+---------------+-----------+

Definition:
- APP_MON: the month that the customer joined, which is the max date from the start date of [d.SETUP_DT], [d.REINSTATE_DT] and [d.LOCAL_REINSTATE_DT]
- ORD_MON: the month that the customer purchased, which is the start date of the order date month
- TimeDiff: The duration by month between APP_MON and ORD_MON, e.g. if A's ODR_MON is 2018/1/1 and A'S APP_MON is 2018/2/1, the duration is 1.
- TTL_AMT: the total order amount that the customer bought in the related order date month

I tried to get the data from 3rd table. But I run the code below and it's very slow... I need a more effective way since I have millions of data... Thanks.

I don't think you need to use unpivot . To get the latest date you can just use the greatest() function.

This solution has two subqueries, one to calculate the app_mon for each new customer and the other to calculate the earliest order date for all customers who placed an order in the last two years. This may not be the most performative approach but your first priority should be to get the correct outcome; once you have that you can tune it if necessary:

with cust as 
(
    select d.dist_id as id
          , greatest(d.setup_dt, d.reinstate_dt, d.local_reinstate_dt) as app_mo 
    from mjensen_dev.gc_distributor d
    where d.setup_dt >= date '2017-01-01'
    or d.reinstate_dt >= date '2017-01-01'
    or d.local_reinstate_dt >= date '2017-01-01'
) , ord as 
(
    select o.dist_id as id
          , min(o.ord_dt) as ord_mon 
          , sum(o.oal) as ord_amt
    from gc_orders o
    where o.ord_dt >= date '2017-01-01'
    group by o.dist_id
          , trunc(o.ord_dt,'mm')
)
select cust.dist_id as id
       , cust.app_mon
       , ord.ord_mon
       , floor(months_between(ord.ord_mon, cust.app_mon ) as mon_diff
       , sum(o.oal) as ord_amt
from cust
     inner join gc_orders o on cust.id = o.dist_id
order by 1, 2
/

You may wish to tweak at my calculation of mon_diff . This calculation treats 2018/2/1 - 2018/1/1 as one month difference. Because it seems odd to me that a customer who places an order on the day they joined would have a mon_diff of 1 rather than zero. But if your statement of the business rule is correct you would need to add 1 to the calculation. Likewise I have not included the trunc() in the processing of the dates but you may wish to reinstate it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM