[英]Hive - create an internal table from three external tables
I have three external tables in HIVE:我在 HIVE 中有三个外部表:
table 1:表格1:
CREATE EXTERNAL TABLE IF NOT EXISTS table_1(
unique_key_column_1 VARCHAR,
column_needed_1 DATE,
redundant_column_1 VARCHAR,
redundant_column_2 VARCHAR,
redundant_column_3 VARCHAR,
column_needed_2 TIMESTAMP,
redundant_column_4 VARCHAR,
redundant_column_5 VARCHAR,
column_needed_3 INT,
redundant_column_6 VARCHAR,
redundant_column_7 VARCHAR)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',’
STORED AS TEXTFILE location '/user/<username>/visdata';
table 2:表 2:
CREATE EXTERNAL TABLE IF NOT EXISTS table_2(
unique_key_column_1 VARCHAR,
column_needed_4 VARCHAR,
column_needed_5 VARCHAR,
unique_key_column_2 VARCHAR,
redundant_column_1 VARCHAR,
redundant_column_2 VARCHAR,
redundant_column_3 VARCHAR,
column_needed_6 TINYINT,
redundant_column_4 VARCHAR,
redundant_column_5 VARCHAR,
column_needed_7 DATE,
redundant_column_6 VARCHAR,
redundant_column_7 VARCHAR)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',’
STORED AS TEXTFILE location '/user/<username>/visdata';
table 3:表3:
CREATE EXTERNAL TABLE IF NOT EXISTS table_3(
unique_key_column_2 VARCHAR,
redundant_column_1 VARCHAR,
redundant_column_2 VARCHAR,
redundant_column_3 VARCHAR,
redundant_column_4 VARCHAR,
redundant_column_5 VARCHAR,
column_needed_8 VARCHAR,
column_needed_9 TINYINT,
redundant_column_6 VARCHAR,
redundant_column_7 VARCHAR,
column_needed_10 TIMESTAMP)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',’
STORED AS TEXTFILE location '/user/<username>/visdata';
I now want to make a managed table, with left outer joining above tables on my two unique key columns, like this:我现在想制作一个托管表,在我的两个唯一键列上左外连接上面的表,如下所示:
unique_key_column_1 column_needed_1 column_needed_2 column_needed_3 column_needed_4 column_needed_5 column_needed_1 column_needed_6 column_needed_7 unique_key_column_2 column_needed_8 column_needed_9 column_needed_10
key_entry_1_1 entry_1_1 entry_1_2 entry_1_3 entry_1_4 entry_1_5 entry_1_6 entry_1_7 key_entry_1_2 entry_1_8 entry_1_9 entry_1_10
key_entry_2_2 entry_2_1 entry_2_2 entry_2_3 entry_2_4 entry_2_5 entry_2_6 entry_2_7 key_entry_2_2 entry_2_8 entry_2_9 entry_2_10
How do I do this?我该怎么做呢?
EDIT 1:编辑 1:
Here's what i could come up with, to join from two tables.这是我能想到的,从两个表中加入。 I still couldn't figure out how three tables can be joined to make a single table:
我仍然无法弄清楚如何将三个表连接起来形成一个表:
> create table combined_table;
> insert into combined_table SELECT * FROM (SELECT r.unique_key_column_1, r.column_needed_1, r.column_needed_2, r.column_needed_3, o.r.column_needed_4, o.column_needed_5, o.column_needed_6, o.column_needed_7 FROM table_1 LEFT OUTER JOIN table_2 o ON (r.unique_key_column_1 = o.unique_key_column_2 );
EDIT 2:编辑 2:
I just realised that joins are expensive.我刚刚意识到加入很昂贵。 So, is there any I can do this using partitions?
那么,我可以使用分区来做到这一点吗?
@NaveenKumar The solution here is to write the schema for the combinedTable you want. @NaveenKumar 这里的解决方案是为您想要的 combinedTable 编写架构。 Then insert the results from the 3 tables into the final table.
然后将 3 个表的结果插入到最终表中。
INSERT INTO combinedTable [SELECT JOIN QUERY HERE]
You can create combined table by left joining all three tables.您可以通过左连接所有三个表来创建组合表。 Check below query.
检查以下查询。
Creating table & inserting data.创建表和插入数据。
CREATE TABLE IF NOT EXISTS COMBINED_TABLE AS
SELECT
UNIQUE_KEY_COLUMN_1,
TBLA.COLUMN_NEEDED_1,
TBLA.COLUMN_NEEDED_2,
TBLA.COLUMN_NEEDED_3,
TBLB.COLUMN_NEEDED_4,
TBLB.COLUMN_NEEDED_5,
TBLB.COLUMN_NEEDED_6,
TBLB.COLUMN_NEEDED_7,
TBLC.UNIQUE_KEY_COLUMN_2,
TBLC.COLUMN_NEEDED_8,
TBLC.COLUMN_NEEDED_9,
TBLC.COLUMN_NEEDED_10,
FROM
TABLE_1 TBLA
LEFT JOIN
TABLE_2 TBLB
ON TBLA.UNIQUE_KEY_COLUMN_1 = TBLB.UNIQUE_KEY_COLUMN_1
LEFT JOIN
TABLE_3 TBLC
ON TBLC.UNIQUE_KEY_COLUMN_2 = TBLB.UNIQUE_KEY_COLUMN_1;
Inserting data into table if target table already created.如果目标表已创建,则将数据插入表中。
INSERT INTO COMBINED_TABLE
SELECT
UNIQUE_KEY_COLUMN_1,
TBLA.COLUMN_NEEDED_1,
TBLA.COLUMN_NEEDED_2,
TBLA.COLUMN_NEEDED_3,
TBLB.COLUMN_NEEDED_4,
TBLB.COLUMN_NEEDED_5,
TBLB.COLUMN_NEEDED_6,
TBLB.COLUMN_NEEDED_7,
TBLC.UNIQUE_KEY_COLUMN_2,
TBLC.COLUMN_NEEDED_8,
TBLC.COLUMN_NEEDED_9,
TBLC.COLUMN_NEEDED_10,
FROM
TABLE_1 TBLA
LEFT JOIN
TABLE_2 TBLB
ON TBLA.UNIQUE_KEY_COLUMN_1 = TBLB.UNIQUE_KEY_COLUMN_1
LEFT JOIN
TABLE_3 TBLC
ON TBLC.UNIQUE_KEY_COLUMN_2 = TBLB.UNIQUE_KEY_COLUMN_1;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.