简体   繁体   中英

SAP HANA SQL match string patterns and combinations

I am using HANA and I am trying to extract the data from the table based on the multiple combinations of values in a column:
Below is my customer table layout:

Table A:

Customer Product

ABC P1
ABC P2
ABC P3
ABC G2
ABC G4
ABC G6
ABC G5

Below are the Sales Campaign rules:

Combinations:

Campaign_Name Products_Purchased

Campaign_1 (P1 & P2 & P3)
Campaign_2 (G2 or G4) & G6

If a Customer purchased all of the products P1,P2 and P3 then it would qualify for the Campaign_1.
If a Customer purchased any of the products from (G2 or G4 ) and G6 then it would qualify for the Campaign_2.
In this example, since customer 'ABC' purchased the combinations of the products mentioned in the campaign, it would be qualified for both the campaigns.

Expected Result:

Customer Sales_Campaign

ABC Campaign_1
ABC Campaign_2

Below are the steps I have performed so far:
Step 1: I extracted all the products purchased at a customer level. Step2: I aggregated all the products purchased by a customer in a single row separated by a comma.
Step 3: Gave all the possible combinations manually in the case statement and setup a flag column to each of the campaigns to determine if the customer purchased any qualifying sales_campaigns.
Step 4 : If the Flag is 'Y' then I extracted the respective campaign name.

But if the number of products increases then the step 3 above would be impossible to derive.

Step2 Result:

Customer P_COMBO G_COMBO

ABC #,1,P1,P2,P3 #,1,G2,G4,G5,G6

Step4 Result:

Customer campaign1_flag Campaign2_flag

ABC Y Y

SELECT DISTINCT
  B.CUSTOMER
  CASE WHEN P_COMBO ='#,1,P1,P2,P3' THEN 'Y' ELSE 'N' END) AS campaign1_flag,
  (CASE 
     WHEN G_COMBO  ='#,1,G2,G6'       THEN 'Y'
     WHEN G_COMBO  ='#,1,G4,G6'       THEN 'Y'
     WHEN G_COMBO  ='#,1,G2,G5,G6'    THEN 'Y'
     WHEN G_COMBO  ='#,1,G4,G5,G6'    THEN 'Y'
     WHEN G_COMBO  ='#,1,G2,G4,G5,G6' THEN 'Y'
     ELSE 'N'
  END) AS Campaign2_flag
FROM (
  SELECT DISTINCT
    A.CUSTOMER ,
    (   MAX(CASE WHEN A.PRODUCT= '1'  THEN A.PRODUCT ELSE '#' END)
      || MAX(CASE WHEN A.PRODUCT= 'P1'  THEN ',' || A.PRODUCT ELSE '' END)
      || MAX(CASE WHEN A.PRODUCT= 'P2'  THEN ',' || A.PRODUCT ELSE '' END)
      || MAX(CASE WHEN A.PRODUCT= 'P3'  THEN ',' || A.PRODUCT ELSE '' END)
    ) AS P_COMBO  ,
    (   MAX(CASE WHEN A.PRODUCT= '1'  THEN A.PRODUCT ELSE '#' END)
      || MAX(CASE WHEN A.PRODUCT= 'G2'  THEN ',' || A.PRODUCT ELSE '' END)
      || MAX(CASE WHEN A.PRODUCT= 'G4'  THEN ',' || A.PRODUCT ELSE '' END)
      || MAX(CASE WHEN A.PRODUCT= 'G5'  THEN ',' || A.PRODUCT ELSE '' END)
      || MAX(CASE WHEN A.PRODUCT= 'G6'  THEN ',' || A.PRODUCT ELSE '' END)
    ) AS G_COMBO
    FROM
    (SELECT DISTINCT CUSTOMER,PRODUCT FROM customer) A
    GROUP BY CUSTOMER,2,3
 ) B

Please help me find a better solution. Any ideas are very much appreciated.

If you allow for changing the way you store the definition of what groups go into a campaign, then the following solution is possible:

  1. Campaigns are defined by a set of groups.
  2. The groups contain product IDs or names.
  3. Any group can be treated as an OR group, meaning that to match this group, at least one product from this group has to be bought. Alternatively, groups can be AND groups meaning all products need to be bought to match the group.

This can look like so:

create column table camp_grps
    (camp NVARCHAR(20) not null
    , grp integer not null
    , grp_type NVARCHAR(3) not null
    , product NVARCHAR(20) not null);


insert into camp_grps values
    ('Campaign 1', '1', 'AND', 'P1');
insert into camp_grps values
    ('Campaign 1', '1', 'AND', 'P2');
insert into camp_grps values
    ('Campaign 1', '1', 'AND', 'P3');

insert into camp_grps values
    ('Campaign 2', '1', 'OR', 'G2');
insert into camp_grps values
    ('Campaign 2', '1', 'OR', 'G4');
insert into camp_grps values
    ('Campaign 2', '2', 'AND', 'G6');  


CAMP       | GRP    | GRP_TYPE| PRODUCT
-----------|--------|---------|-------  
Campaign 1 |    1   |   AND   |     P1     
Campaign 1 |    1   |   AND   |     P2     
Campaign 1 |    1   |   AND   |     P3     
Campaign 2 |    1   |   OR    |     G2     
Campaign 2 |    1   |   OR    |     G4     
Campaign 2 |    2   |   AND   |     G6  

Now we can get an easy overview of

  • campaigns,
  • the groups in each of them including
  • the type of group,
  • how many groups make up the campaign and
  • how many products make up each group.

The SQL for this looks like this:

select 
    camp
  , grp
  , grp_type
  , count(distinct grp) over
      (partition by camp)     camp_grp_cnt
  , count(*) prd_cnt
from
    camp_grps
group by
    camp, grp, grp_type;   

CAMP        |GRP    |GRP_TYPE   |CAMP_GRP_CNT   |PRD_CNT
------------|-------|-----------|---------------|-------
Campaign 1  |1      |AND        |1              |3      
Campaign 2  |1      |OR         |2              |2      
Campaign 2  |2      |AND        |2              |1      

The rest is a bit tedious, but not overly complicated. We need to
- count how many matching purchases each customer has for each grp - depending on the type of group ( AND / OR ) we need to count a group as matched when the number of matches with the group is either larger than 0 ( OR group) or at least the number of products in the group ( AND group).

The combined SQL looks like this:

with grp_ref as
    (select 
         camp
       , grp
       , grp_type
       , count(distinct grp) over
           (partition by camp)     camp_grp_cnt
       , count(*) prd_cnt
    from
        camp_grps
    group by
        camp, grp, grp_type),
grp_match as
(select 
     p.customer, p.product
   , gr.camp camp_ref, gr.grp grp_ref, gr.grp_type grp_type_ref, gr.prd_cnt prd_cnt_ref
   , gr.camp_grp_cnt
   , count(*) over
        (partition by p.customer, cg.camp, gr.grp) camp_total_matches
from 
        purchases p
    left outer join camp_grps cg
            on p.product = cg.product
    inner join grp_ref gr
        on (cg.camp, cg.grp) = (gr.camp, gr.grp)),
match_cnter as 
 (select
      customer, product
    , camp_ref, grp_ref, grp_type_ref, prd_cnt_ref, camp_total_matches
    , camp_grp_cnt
    , case 
        when grp_type_ref = 'AND' 
            and ((camp_total_matches - prd_cnt_ref) >= 0) then
                'AND grp matched'
        when grp_type_ref = 'OR' 
            and (camp_total_matches > 1) then
                'OR grp matched'
      end matched_grp_info
    , case 
        when grp_type_ref = 'AND' 
            and ((camp_total_matches - prd_cnt_ref) >= 0) then
                1
        when grp_type_ref = 'OR' 
            and (camp_total_matches > 1) then
                1
      end matched_grp
 from 
    grp_match)
select  
    customer, camp_ref, SUM(matched_grp), MIN(camp_grp_cnt)
from 
    match_cnter
group by 
    customer, camp_ref;

The result looks like this:

CUSTOMER    |CAMP_REF   |SUM(MATCHED_GRP)   | MIN(CAMP_GRP_CNT)
------------|-----------|-------------------|---------------------------
ABC         |Campaign 1 |3                  |1                
ABC         |Campaign 2 |3                  |2                
XYZ         |Campaign 1 |?                  |1                

Per CUSTOMER and CAMP_REF (campaign reference) we see how many groups of the campaign have been matched ( SUM(MATCHED_GRP) ) and how many needed to be matched to be eligible for the campaign ( MIN(CAMP_GRP_CNT) ).

Customer ABC is eligible to both Campaign 1 and 2, while customer XYZ has not matched any group (see the ? as NULL in the result).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM