简体   繁体   中英

How to populate columns of a new hive table from multiple existing tables?

I have created a new table in hive (T1) with columns c1,c2,c3,c4. I want to populate data into this table by querying from other existing tables(T2,T3).

Eg c1 and c2 come from a query run on T2 & the other columns c3 and c4 come from a query run on T3. Is this possible in hive ? I have done immense research but still am unable to find a solution to this

Didn't something like this work?

create table T1 as 
select t2.c1, t2.c2, t3.c3, t3.c4 from (some query against T2) t2 JOIN (some query against T3) t3

Obviously replace JOIN with whatever is needed. I assume some join between T2 and T3 is possible or else you wouldn't be putting their columns alongside each other in T1.

According to the hive documentation , you can use the following syntax to insert data:

INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

Be careful that:

Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to.

So, I would make a JOIN between the two existing table, and then insert only the needed values in the target table playing around with SELECT. Or maybe creating a temporary table would allow you to have more control over the data. Just remember to handle the problem with NULL, as stated in the official documentation. This is just an idea, I guess there are other ways to achieve what you need, but could be a good place to start from.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM