简体   繁体   中英

Multiplication of column values based on conditional row grouping in SQL Server

在此处输入图片说明

For the above data, I want to calculate a percentage value for each purchase row (purchase = 1) based on grouping the row.

The condition for calculation is:

  1. The visit_time of previous rows should be within 7 days from the purchase visit_time.
  2. The rows with the same visitor id should only be considered in the calculation.

For example, the percentage values should be calculated as below:

  • Row 2 Percent_val = val of row 1 x val of row 2 = 0.23 x 0.97 = 0.2231
  • Row 3 Percent_val = val of row 1 x val of row 2 x val of row 3 = 0.23 x 0.97 x 0.55 = 0.122705
  • Row 4 Percent_val = val of row 4 = 0.11
  • Row 7 Percent_val = val of row 5 x val of row 6 x val of row 7 = 0.57 x 0.16 x 0.38 = 0.034656 (row 4 will not be considered as it's visit_time is not within 7 days range of purchase row ie row 7)

I am using SQL Server 2012.

The expected result would be similar to below:

在此处输入图片说明

How to get the expected result here?

Script to generate test data:

    CREATE TABLE [#tmp_data]
(
    [visitor]       INT, 
    [visit_id]      INT, 
    [visit_time]    DATETIME, 
    [val]           numeric(4,2),
    [purchase]      BIT
);

INSERT INTO #tmp_data( visitor, visit_id, visit_time,val, purchase )
VALUES( 1, 1001, '2020-01-01 10:00:00', 0.23,0 ), 
( 1, 1002, '2020-01-02 11:00:00', 0.97,1 ), 
( 1, 1003, '2020-01-02 14:00:00', 0.55, 1 ), 
( 2, 2001, '2020-01-01 10:00:00', 0.11, 1 ), 
( 2, 2002, '2020-01-07 11:00:00', 0.57, 0 ), 
( 2, 2003, '2020-01-08 14:00:00', 0.16, 0 ), 
( 2, 2004, '2020-01-11 14:00:00', 0.38, 1 );

In SQL Server, one option uses a lateral join:

select t.*, x.percent_val
from #tmp_data t
cross apply (
    select exp(sum(log(t1.val))) percent_val
    from #tmp_data t1
    where t1.visitor = t.visitor and t1.visit_time > dateadd(day, - 7, t.visit_time) and t1.visit_time <= t.visit_time
) x
where t.purchase = 1

The lateral join recovers the visits of the last 7 days for the same visitor. Then, we use arithmetics to compute the aggregate product of the value (this works as long as val is greater than 0 ).

Demo on DB Fiddle :

visitor | visit_id | visit_time              |  val | purchase | percent_val
------: | -------: | :---------------------- | ---: | :------- | ----------:
      1 |     1002 | 2020-01-02 11:00:00.000 | 0.97 | True     |      0.2231
      1 |     1003 | 2020-01-02 14:00:00.000 | 0.55 | True     |    0.122705
      2 |     2001 | 2020-01-01 10:00:00.000 | 0.11 | True     |        0.11
      2 |     2004 | 2020-01-11 14:00:00.000 | 0.38 | True     |    0.034656

If you want to handle 0 values as well, then you can change the select clause of the suquery:

select case when min(val) = 0 
    then 0 
    else exp(sum(log(case when val > 0 then t1.val end))) 
end percent_val

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM