简体   繁体   中英

SQL Server - SUM Preceding Rows With Condition Linked to Original Row

For each row in the below example data set, the code does a sum of the previous 5 rows when a certain condition is met.

The problem I'm having is the condition needs to reference the original row rating eg I need to sum preceding rows only if the rating is within 1 of the current row.

Example data:

DECLARE @tbl TABLE 
             (
                 Team varchar(1),
                 date date, 
                 Rating int, 
                 Score int
             );

INSERT INTO @tbl (Team, Date, Rating, Score)
VALUES
('a', '2020/12/05', '20', '1'),
('a', '2020/12/04', '18', '8'),
('a', '2020/12/03', '21', '3'),
('a', '2020/12/02', '19', '4'),
('a', '2020/12/01', '19', '3');

Current code:

SELECT
    Rating, 
    SUM(CASE WHEN Rating >= (Rating-1) AND  Rating <= (Rating+1) THEN SCORE END) 
        OVER (partition by Team ORDER BY Date ASC ROWS BETWEEN 5 PRECEDING AND 1 PRECEDING) AS SUM
FROM
    @tbl
ORDER BY 
    Date DESC

Output:

    +------------------+------------+------------+
    |  Rating          | Current    | Required   | 
    +------------------+------------+------------+
    | 20               | 18         |     7      |
    | 18               | 10         |     7      |
    | 21               | 7          |     NULL   |
    | 19               | 3          |     3      |
    | 18               | NULL       |     NULL   |
    +------------------+------------+------------+

The problem is this following section of the code is not working as the rating is being assessed on a line by line by line basis.

CASE WHEN Rating >= (Rating-1) AND  Rating <= (Rating+1)

I need it to assess against the rating of the original row (I've looked into Top but that isn't working):

CASE WHEN Rating >= ((SELECT TOP 1 Rating) - 1) AND Rating <= ((SELECT TOP 1 Rating) + 1)

Any help appreciated as always.

What you are describing sounds like a lateral join:

SELECT t.*, t2.*
FROM @tbl t OUTER APPLY
     (SELECT SUM(t2.score) as score_5
      FROM (SELECT TOP (5) t2.*
            FROM @tbl t2
            WHERE t2.date < t.date
            ORDER BY t2.date DESC
           ) t2
      WHERE t2.rating BETWEEN t.rating - 1 AND t.rating + 1
     ) t2
ORDER BY Date DESC

I'm not familiar with sql server syntax, but here's how to do it using Spark SQL. Essentially the idea is to create a row for each pair of rows which are within 5 rows of each other, and then do the sum if .

select
    Team, date, Rating,
    sum(case when old_score[0] between rating-1 and rating+1 then old_score[1] end) as sum
from (
    select
        *,
        explode_outer(scores) as old_score
    from (
        select
            *,
            collect_list(array(rating, score))
            over (partition by Team order by date rows between 5 preceding and 1 preceding) scores
        from tbl
    )
)
group by Team, date, Rating
order by Team, date, Rating;

which gives

a       2020-12-01      19      NULL
a       2020-12-02      19      3
a       2020-12-03      21      NULL
a       2020-12-04      18      7
a       2020-12-05      20      10

and reveals that you've probably made a mistake in your expected output;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM