简体   繁体   中英

Manipulate data in SQL (backfilling, pivoting)

I have a table similar to this small example:

![在此处输入图像描述

I want to manipulate it to this format:

在此处输入图像描述

Here's a sample SQL script to create an example input table:

CREATE TABLE sample_table
(
    id INT,
    hr INT,
    tm DATETIME,
    score INT, 
 )

INSERT INTO sample_table
VALUES (1, 0, '2021-01-21 00:26:45', 2765),
(1, 0, '2021-01-21 00:49:00', 2765),
(1, 5, '2021-01-21 07:47:03', 1593),
(1, 7, '2021-01-21 11:50:48', 1604),
(1, 7, '2021-01-21 12:00:32', 1604),
(2, 0, '2021-01-21 00:50:45', 3500),
(2, 2, '2021-01-21 01:49:00', 2897),
(2, 2, '2021-01-21 05:47:03', 2897),
(2, 4, '2021-01-21 09:30:48', 2400),
(2, 6, '2021-01-21 12:00:32', 1647);

I tried using combination of LAG and CASE WHEN, not successful so far. Looking for some ideas on how to manipulate (what functions etc). Would be awesome to see example script for the manipulation.

Where there is multiple values per id & hr, then earliest values to be used. Eg id=1 & hr=7, then hr_7=uses value from 11:50. Although in this example, it's the same values for both records, it can differ.

thanks for the test scripts which makes life easy.

Here is an idea of how to go about with this using postgresql.

In the first block ->data. I try to get all possible combination of id and mutate that 8 times. Therefore i would get data as

id num
1  0
1  1
...
1  8
2  0
2..8

after that in the block for raw_data, i left join with the actual data in the sample_table, with that i am assured of one row per each of the hrs 0..8 I also rank the rows on the basis of the earliest score by (id,hr) --> rnk

I then make use of rnk=1 and get the previous score using the max(score) over(partition by grp).

And after that,I group the data by the id and perform a "pivot" using the max logic, which gives the expected output.

Output

+-----+------+------+------+------+------+------+------+------+------+
| id1 | hr_0 | hr_1 | hr_2 | hr_3 | hr_4 | hr_5 | hr_6 | hr_7 | hr_8 |
+-----+------+------+------+------+------+------+------+------+------+
|   1 | 2765 | 2765 | 2765 | 2765 | 2765 | 1593 | 1593 | 1604 | 1604 |
|   2 | 3500 | 3500 | 2897 | 2897 | 2400 | 2400 | 1647 | 1647 | 1647 |
+-----+------+------+------+------+------+------+------+------+------+



/*
CREATE TABLE sample_table
(
    id INT,
    hr INT,
    tm timestamp,
    score INT
 );
 
INSERT INTO sample_table
VALUES (1, 0, '2021-01-21 00:26:45', 2765),
(1, 0, '2021-01-21 00:49:00', 2765),
(1, 5, '2021-01-21 07:47:03', 1593),
(1, 7, '2021-01-21 11:50:48', 1604),
(1, 7, '2021-01-21 12:00:32', 1604),
(2, 0, '2021-01-21 00:50:45', 3500),
(2, 2, '2021-01-21 01:49:00', 2897),
(2, 2, '2021-01-21 05:47:03', 2897),
(2, 4, '2021-01-21 09:30:48', 2400),
(2, 6, '2021-01-21 12:00:32', 1647);
*/
with data
  as (select b.id as id
             ,f  as num
        from generate_series(0,8) f
        join (select distinct id from sample_table) as b          
          on 1=1
     )       
    ,raw_data
     as (
   select d.id as id1
          ,d.num as num1
          ,st.*
          ,row_number() over(partition by d.id,d.num order by st.tm asc) as rnk
     from data d
left join sample_table st
       on d.id=st.id
      and d.num=st.hr
       )
     ,prep_data
      as (select id1
                ,num1
                ,max(score) over(partition by id1,grp) as earliest_score 
           from (select id1,num1,score
                       ,sum(case when score is not null then 1 else 0 end)
                        over(partition by id1 order by num1) as grp
                   from raw_data
                  where rnk=1  
                 )x
          )
select id1
       ,max(case when num1=0 then earliest_score end) as hr_0
       ,max(case when num1=1 then earliest_score end) as hr_1
       ,max(case when num1=2 then earliest_score end) as hr_2
       ,max(case when num1=3 then earliest_score end) as hr_3
       ,max(case when num1=4 then earliest_score end) as hr_4
       ,max(case when num1=5 then earliest_score end) as hr_5
       ,max(case when num1=6 then earliest_score end) as hr_6
       ,max(case when num1=7 then earliest_score end) as hr_7
       ,max(case when num1=8 then earliest_score end) as hr_8
  from prep_data
group by id1 
order by id1;

I tried to set up the scripts on db-fiddle, but it keeps crashing for postgresql query that i have used.

But it does work in postgresql database as i have run this below here and it works..

https://extendsclass.com/postgresql-online.html

I would suggest this logic:

with u as (   -- get unique values
      select id, hr, tm, score,
             lead(hr) over (partition by id order by hr) as next_hr
      from (select t.*,
                   row_number() over (partition by id, hr order by tm asc) as seqnum
            from t
           )
      where seqnum = 1
     )
select id,
       max(case when hr <= 1 and (next_hr > 1 or next_hr is null) then score end) as hr_1,
       max(case when hr <= 2 and (next_hr > 2 or next_hr is null) then score end) as hr_2,
       max(case when hr <= 3 and (next_hr > 3 or next_hr is null) then score end) as hr_3,
       max(case when hr <= 4 and (next_hr > 4 or next_hr is null) then score end) as hr_4,
       max(case when hr <= 5 and (next_hr > 5 or next_hr is null) then score end) as hr_5,
       max(case when hr <= 6 and (next_hr > 6 or next_hr is null) then score end) as hr_6,
       max(case when hr <= 7 and (next_hr > 7 or next_hr is null) then score end) as hr_7,
       max(case when hr <= 8 and (next_hr > 8 or next_hr is null) then score end) as hr_8
from t
group by id;

This first removes the duplicates and then adds a range for when the score is valid. The conditional aggregation then uses this information.

DECLARE @columns VARCHAR(MAX) = '',
@sql VARCHAR(MAX) = ''
SELECT  @columns+=QUOTENAME(hr) + ',' 
FROM (
SELECT DISTINCT hr
from sample_table
) M
SET @columns = LEFT(@columns, LEN(@columns) - 1);
--SELECT @columns
SET @sql ='
(SELECT * FROM   
(
select ID,hr,score from sample_table
) t 
PIVOT(MAX(score)
FOR hr IN ('+ @columns +')
) AS pivot_table) ';
EXEC (@sql)

output:
ID  0   2   4   5   6   7
1   2765    NULL    NULL    1593    NULL    1604
2   3500    2897    2400    NULL    1647    NULL

Try this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM