I have two tables (table a and table b) in Hive.
The table a is an init table (full). The table b is a delta table with the latest updates (this table has the same columns and 5 more columns including a timestamp column for the last updates).
I want to create a last view (join between full + delta) from the two tables to have the last updates.
1- The first step I did is to select the rows with the max(timestamp) from table b (delta) to have last updates
2- Then I used it like a subquery in an inner join :
SELECT full.*, delta.*
FROM table a FULL
INNER JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY timestamp
DESC) as rn
FROM b) delta
ON f.id = d.id and d.rn = 1
The problem is that I have to specify the columns I want to use in the select and I will have duplicated columns with different values when table b is updated.
I need a condition to always select columns from table a and replace columns in table a with columns in table b when values in table b is different from values in table a.
Any solutions please ?
Seems like you are looking for COALESCE or NVL functions. In that case your query may look like:
SELECT
A.Person_id as Person_id, -- Person_id is taken from A table
A.Training_id as Training_id, -- Training_id is taken from A table
coalesce(d.Status, A.Status) as Status, -- If Status from Delta table is not null it is taken otherwise Status from A table is taken
coalesce(d.Name, A.Name) as Name -- If Name from Delta table is not null it is taken otherwise Name from A table is taken
FROM A
LEFT JOIN (SELECT -- Left outer join to don't filter out records without updates
B.Status,
B.Name,
ROW_NUMBER() OVER (PARTITION BY Person_id ORDER BY timestamp DESC) as rn
FROM B) d ON (A.Person_id = d.Person_id and d.rn = 1);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.