Hive : inner join to update "table a" from "table b" where values in "table b" is different from "table a" ? create a last view (a+b)

Question

I have two tables (table a and table b) in Hive.

The table a is an init table (full). The table b is a delta table with the latest updates (this table has the same columns and 5 more columns including a timestamp column for the last updates).

I want to create a last view (join between full + delta) from the two tables to have the last updates.

1- The first step I did is to select the rows with the max(timestamp) from table b (delta) to have last updates

2- Then I used it like a subquery in an inner join :

SELECT full.*, delta.* 
FROM table a FULL 
INNER JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp 
            DESC) as rn
            FROM b) delta 
ON f.id = d.id and d.rn = 1

The problem is that I have to specify the columns I want to use in the select and I will have duplicated columns with different values when table b is updated.

I need a condition to always select columns from table a and replace columns in table a with columns in table b when values in table b is different from values in table a.

Any solutions please ?

Answer 1

Seems like you are looking for COALESCE or NVL functions. In that case your query may look like:

SELECT
    A.Person_id as Person_id,               -- Person_id is taken from A table
    A.Training_id as Training_id,           -- Training_id is taken from A table
    coalesce(d.Status, A.Status) as Status, -- If Status from Delta table is not null it is taken otherwise Status from A table is taken
    coalesce(d.Name, A.Name) as Name        -- If Name from Delta table is not null it is taken otherwise Name from A table is taken 
FROM A
LEFT JOIN (SELECT                           -- Left outer join to don't filter out records without updates
                B.Status, 
                B.Name, 
                ROW_NUMBER() OVER (PARTITION BY Person_id ORDER BY timestamp DESC) as rn 
            FROM B) d ON (A.Person_id = d.Person_id and d.rn = 1);

Hive : inner join to update "table a" from "table b" where values in "table b" is different from "table a" ? create a last view (a+b)

Question

1 answers

solution1
0 ACCPTED 2020-01-21 10:28:28

Hive : inner join to update "table a" from "table b" where values in "table b" is different from "table a" ? create a last view (a+b)

Question

1 answers

solution1 0 ACCPTED 2020-01-21 10:28:28

solution1
0 ACCPTED 2020-01-21 10:28:28