简体   繁体   中英

Consolidate, Combine, Merge Rows

Every search I do leads me to results for people seeking array_agg to combine multiple columns in a row into column. That's not what I am trying to figure out here, and maybe I am not using the right search terms (eg, consolidate, combine, merge).

I am trying to combine rows by populating values in fields ... I am not sure the best way to describe this other than with an example:

Current:
--------------------------------
 id  num_1  num_2  num_3  num_4 
--------------------------------
 1    111    222     0      0   
 2    111    333     0      0   
 3    111     0      0     444  
 4     0     222    555     0   
 5    777    999     0      0   
 6     0     999    888     0   

After Processing:
--------------------------------
 id  num_1  num_2  num_3  num_4 
--------------------------------
 1    111    222    555    444  
 2    111    333    555    444  
 3    111    333    555    444  
 4    111    222    555    444  
 5    777    999    888     0   
 6    777    999    888     0   


After Deleting Duplicate Rows:
--------------------------------
 id  num_1  num_2  num_3  num_4 
--------------------------------
 1    111    222    555    444  
 2    111    333    555    444  
 3    777    999    888     0   

This will likely be a 2 step process ... first fill in the blanks, and then find/delete the duplicate. I can do the second step, but having trouble figuring how to first populate the 0 values with values from another row where you might have two different values (id 1/2 for num_2 column) but only one value for num_1 (eg, 111)

I can do it in PHP, but would like to figure out how to do it using only Postgres.

EDIT: My example table is a relations table. I have multiple datasets with similar information (eg, username) but different registration ID numbers. So, I do an inner join on table 1 and table 2 (for example) where the username is the same. Then I take the registration IDs (which are different) from each table and insert that as a row into my relations table. In my example tables above, Row 1 has two different registration IDs from the two tables I joined … the values 111 ( num_1 ) and 222 ( num_2 ) are inserted into the table and zeros inserted for num_3 and num_4 . Then I compare table 1 and table 4 and the values 111 ( num_1 ) and 444 ( num_4 ) get inserted into the relations table and zeros for num_2 and num_3 . Since registration ID 111 is related to registration ID 222 and registration ID 111 is related to registration ID 444, then registration IDs 111, 222, and 444 are all related (meaning the username is the same for each of those registration IDs). Does that help to clarify?

EDIT 2: I corrected Tables 2 and 3. Hopefully now it makes sense. The username column is not unique. So, I have 4 tables like this:

Table 1:

bob  - 111
mary - 777

Table 2:

bob  - 222
bob  - 333
mary - 999

Table 3:

bob  - 555
mary - 888

Table 4:

bob  - 444  -- mary does not exist in this table

So, in my relations table I should end up with 3 rows as given in example Table 3 above.

If your values are always increasing (as in the example), then just use cumulative maximum and then select distinct :

select row_number() over (order by min(id)) as id,
       t.num1, t.num2, t.num3, t.num4
from (select id,
             max(num1) over (order by id) as num1,
             max(num2) over (order by id) as num2,
             max(num3) over (order by id) as num3,
             max(num4) over (order by id) as num4
      from t
     ) t
group by t.num1, t.num2, t.num3, t.num4;

If max() doesn't work, then what you really want is lag( . . . ignore nulls) . That is not yet available. Perhaps the simplest method is then correlated subqueries for each column:

select row_number() over (order by min(id)) as id,
       t.num1, t.num2, t.num3, t.num4
from (select id,
             (select t2.num1 from t t2 where t2.id <= t.id and t2.num1 <> 0 order by t2.id desc limit 1
             ) as num1,
             (select t2.num2 from t t2 where t2.id <= t.id and t2.num2 <> 0 order by t2.id desc limit 1
             ) as num2,
             (select t2.num3 from t t2 where t2.id <= t.id and t2.num3 <> 0 order by t2.id desc limit 1
             ) as num3,
             (select t2.num4 from t t2 where t2.id <= t.id and t2.num4 <> 0 order by t2.id desc limit 1
             ) as num4
      from t
     ) t
group by t.num1, t.num2, t.num3, t.num4;

This version would not be very efficient on even medium sized tables.

A more efficient version is more complicated:

select row_number() over (order by id) as id,
       t1.num1, t2.num2, t3.num3, t4.num4
from (select min(id) as id,
      from (select id,
                   max(case when num1 > 0 then id end) over (order by id) as num1_id,
                   max(case when num2 > 0 then id end) over (order by id) as num2_id,
                   max(case when num3 > 0 then id end) over (order by id) as num3_id,
                   max(case when num4 > 0 then id end) over (order by id) as num4_id
            from t
           ) t
      group by num1_id, num2_id, num3_id, num4_id
     ) t left join
     t t1
     on t1.id = t.num1_id left join
     t t2
     on t2.id = t.num2_id left join
     t t3
     on t3.id = t.num3_id left join
     t t4
     on t4.id = t.num4_id left join        
group by t.num1, t.num2, t.num3, t.num4;

EDIT:

That was a little silly. There is an easier way using first_value() (which Postgres unfortunately does not support as an aggregation function):

select row_number() over (order by min(id)) as id,
       num1, num2, num3, num4
from (select id,
             first_value(num1) over (order by (case when num1 is not null then id en) nulls last
                                    ) as num1,
             first_value(num2) over (order by (case when num2 is not null then id end) nulls last
                                    ) as num2,
             first_value(num3) over (order by (case when num3 is not null then id end) nulls last
                                    ) as num3,
             first_value(num4) over (order by (case when num4 is not null then id end) nulls last
                                    ) as num4
      from t
     ) t
group by num1, num2, num3, num4;

It seems like you started in the middle of a presumed solution, forgetting to present the initial problem. Based on your added information I suggest a completely different, much simpler solution. You have:

CREATE TABLE table1 (username text, registration_id int);
CREATE TABLE table2 (LIKE table1);
CREATE TABLE table3 (LIKE table1);
CREATE TABLE table4 (LIKE table1);

INSERT INTO table1 VALUES ('bob', 111), ('mary', 777);
INSERT INTO table2 VALUES ('bob', 222), ('bob', 333), ('mary', 999);
INSERT INTO table3 VALUES ('bob', 555), ('mary', 888);
INSERT INTO table4 VALUES ('bob', 444); -- no mary

Solution

What you really seem to need is FULL [OUTER] JOIN . Details in the manual on FROM and JOIN .

-- CREATE TABLE relations AS
SELECT username
     , t1.registration_id AS reg1
     , t2.registration_id AS reg2
     , t3.registration_id AS reg3
     , t4.registration_id AS reg4
FROM   table1     t1
FULL   JOIN table2 t2 USING (username)
FULL   JOIN table3 t3 USING (username)
FULL   JOIN table4 t4 USING (username)
ORDER  BY username;

That's all. Produces your desired result directly.

username  reg1  reg2  reg3  reg4
---------------------------------
bob       111   222   555   444
bob       111   333   555   444
mary      777   999   888   (null)

Your given example would work with LEFT JOIN as well, since all missing entries are to the right. But that would fail in other constellations. I added some more revealing test cases in the fiddle :

SQL Fiddle.

I assume you are aware that multiple entries in multiple tables will produce a huge number of output rows:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM