简体   繁体   English

根据重叠的活动时间间隔对 SQL 行进行分组,有效从和有效到

[英]Grouping SQL rows based on overlapping active time intervals, valid from and valid to

I'm working in bigquery with this mock data:我正在使用这个模拟数据在 bigquery 中工作:

create schema if not exists dbo;
create table if not exists dbo.player_history(team_id INT, player_id INT, active_from TIMESTAMP, active_to TIMESTAMP);
truncate table dbo.player_history;
INSERT INTO dbo.player_history VALUES(1,1,'2020-01-01', '2020-01-08');
INSERT INTO dbo.player_history VALUES(1,2,'2020-06-01', '2020-09-08');
INSERT INTO dbo.player_history VALUES(1,3,'2020-06-10', '2020-10-01');
INSERT INTO dbo.player_history VALUES(1,4,'2020-02-01', '2020-02-15');
INSERT INTO dbo.player_history VALUES(1,5,'2021-01-01', '2021-01-08');
INSERT INTO dbo.player_history VALUES(1,6,'2021-01-02', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,7,'2021-01-03', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,8,'2021-01-04', '2021-06-08');
INSERT INTO dbo.player_history VALUES(1,9,'2020-01-02', '2021-02-05');
INSERT INTO dbo.player_history VALUES(1,10,'2020-10-01', '2021-04-08');
INSERT INTO dbo.player_history VALUES(1,11,'2020-11-01', '2021-05-08');


select *
 from dbo.player_history
order by 3, 4

and what I want to get out is the active lineups.而我想要得到的是活跃的阵容。 The output would look like so: output 看起来像这样:

https://imgur.com/a/2j8HPiD(上图)

With the logic behind it being:其背后的逻辑是: ![在此处输入图片描述

I've almost cracked it using some sort of lead(valid_from) between valid_to and valid_from and, doing a case when to make it 1 if its a new lineup 0 otherwise, and then doing some sort of cumulative sum on that to get the ID but I'm not able to solve it 100%... I'm very desperate, don't know where to look anymore.我几乎已经在 valid_to 和 valid_from 之间使用某种 lead(valid_from) 破解了它,并且如果它是一个新的阵容 0 则做一个将它设为 1 的情况,然后对其进行某种累积和以获得 ID但我无法 100% 解决它……我非常绝望,不知道该去哪里找了。

**correction: lineup 4 & 5 should actually just be one lineup. **更正:阵容 4 和 5 实际上应该只是一个阵容。

Given that a player can belong to multiple line up as we discussed in the comment section, you might try the approach below using JOIN :正如我们在评论部分讨论的那样,鉴于一个玩家可以属于多个阵容,您可以使用JOIN尝试以下方法:

WITH LINEUPS AS 
    (SELECT a.*,b.player_id as b_player_id
    FROM `dbo.player_history` a
    INNER JOIN  `dbo.player_history` b on b.active_from BETWEEN a.active_from AND a.active_to
    ORDER BY 3, 4)
SELECT 
    team_id,
    ROW_NUMBER () OVER (PARTITION BY team_id ORDER BY  active_from, active_to) AS lineup_id,
    active_from, 
    active_to, 
    ARRAY_AGG(DISTINCT b_player_id) as player_ids
FROM LINEUPS 
GROUP BY team_id, active_from, active_to
ORDER BY active_from, active_to

在此处输入图像描述

Since the output is too long for me to show you via screenshot in Bigquery console, I extracted the results to Google sheets.由于 output 太长,我无法通过 Bigquery 控制台中的屏幕截图向您展示,因此我将结果提取到 Google 表格中。 See below screenshot of output:见下图output截图:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM