简体   繁体   中英

Selecting rows until a certain value within partitions

I'm trying to select rows up until a certain event type is reached. For example below we want to select all rows containing event1 until we reach a row containing event2 , partitioned by id

Input:

| id     | type   | sequence |
|--------|--------|----------|
| abc123 | event1 | 1        |
| abc123 | event1 | 2        |
| abc123 | event1 | 3        |
| abc123 | event2 | 4        |
| abc123 | event1 | 5        |
| abc123 | event1 | 6        |
| def456 | event1 | 4        |
| def456 | event1 | 5        |
| def456 | event2 | 6        |
| def456 | event1 | 7        |
| ghi789 | event2 | 1        |
| ghi789 | event1 | 2        |
| ghi789 | event1 | 3        |

Which would give the following result.

Output:

| id     | type   | sequence |
|--------|--------|----------|
| abc123 | event1 | 1        |
| abc123 | event1 | 2        |
| abc123 | event1 | 3        |
| abc123 | event2 | 4        |
| def456 | event1 | 4        |
| def456 | event1 | 5        |
| def456 | event2 | 6        |
| ghi789 | event2 | 1        |

I've tired ranking the data row_number() over(partition by id, type order by sequence) rank_ , in order to distinguish the data I want to select but can't work out the next step in actually defining the selection criteria. I'm essentially wanting to define a condition that says:

select * from table where seq=<{integer appearing in sequence column when row features event2 in type column, partitioned by id}

You can use window functions to get the minimum sequence for "event2":

select t.*
from (select t.*,
             min(case when type = 'event2' then sequence end) over (partition by id) as event2_sequence
      from t
     ) t
where event2_sequence is null or
      sequence <= event2_sequence;

Note: I am assume that if there is no "event2" then you want all rows. Also, if you have other events, then you might want to filter down to just "event1" and "event2".

As a note, you can use a correlated subquery:

select t.*
from t
where t.sequence <= (select min(t2.sequence)
                     from t t2
                     where t2.id = t.id and t2.type = 'event2'
                    );

Note: This will not return any ids which do not have "event2", although you could handle this as:

select t.*
from t
where t.sequence <= (select coalesce(min(case when t2.type = 'event2' then t2.sequence end),
                                     max(t2.sequence)
                                    )
                     from t t2
                     where t2.id = t.id
                    );

Below is for BigQuery Standard SQL

#standardSQL
SELECT * EXCEPT(flag) FROM (
  SELECT id, type, sequence, 0 = COUNTIF(type = 'event2') OVER(win) flag
  FROM `project.dataset.table`
  WINDOW win AS (PARTITION BY id ORDER BY sequence ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE flag 

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'abc123' id, 'event1' type, 1 sequence UNION ALL
  SELECT 'abc123', 'event1', 2 UNION ALL
  SELECT 'abc123', 'event1', 3 UNION ALL
  SELECT 'abc123', 'event2', 4 UNION ALL
  SELECT 'abc123', 'event1', 5 UNION ALL
  SELECT 'abc123', 'event1', 6 UNION ALL
  SELECT 'def456', 'event1', 4 UNION ALL
  SELECT 'def456', 'event1', 5 UNION ALL
  SELECT 'def456', 'event2', 6 UNION ALL
  SELECT 'def456', 'event1', 7 UNION ALL
  SELECT 'ghi789', 'event2', 1 UNION ALL
  SELECT 'ghi789', 'event1', 2 UNION ALL
  SELECT 'ghi789', 'event1', 3 
)
SELECT * EXCEPT(flag) FROM (
  SELECT id, type, sequence, 0 = COUNTIF(type = 'event2') OVER(win) flag
  FROM `project.dataset.table`
  WINDOW win AS (PARTITION BY id ORDER BY sequence ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
)
WHERE flag 
-- ORDER BY id, sequence   

with output

Row id      type    sequence     
1   abc123  event1  1    
2   abc123  event1  2    
3   abc123  event1  3    
4   abc123  event2  4    
5   def456  event1  4    
6   def456  event1  5    
7   def456  event2  6    
8   ghi789  event2  1    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM