简体   繁体   English

如何从具有动态分区的Hive列中的select查询插入?

[英]How to insert from a select query with dynamic partitioning on a column in Hive?

I'm trying to insert a computed partition. 我正在尝试插入计算分区。 The partition's value needs to be computed from a key column. 需要从键列中计算分区的值。 Assume that key_2 in the example always has 10 characters. 假定示例中的key_2始终具有10个字符。 The last 3 character I want to use in partition. 我要在分区中使用的最后3个字符。 I need dynamic partitioning. 我需要动态分区。

My table is similar to this: 我的表与此类似:

DROP TABLE exampledb.exampletable;
CREATE TABLE exampledb.exampletable (
    key_1 STRING,
    key_2 STRING,
    col_1 STRING,
    col_2 STRING
)
PARTITIONED BY (my_part STRING)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\u0001'
;

I've tried multiple variants of the HQL below: 我已经尝试过以下HQL的多种变体:

INSERT OVERWRITE TABLE exampledb.exampletable
PARTITION(my_part)
SELECT 
    key_1,
    key_2,
    col_1,
    col_2,
    SUBSTR(key_2, -3) as my_part    -- not sure how to insert partition
FROM exampledb.exampletable_temp;

I couldn't figure out what's the correct solution for this. 我不知道什么是正确的解决方案。 I always get a syntax error. 我总是收到语法错误。

Does enyone know the solution for this? Enyone知道解决方案吗? Thanks 谢谢

UPDATE: 更新:

FAILED: SemanticException Partition spec {my_part=null} contains non-partition columns

UPDATE 2: 更新2:

I've also tried tried to avoid NULL values by using this solution (as it was proposed below, but the error is the same): 我还尝试通过使用此解决方案来避免NULL值(如下建议),但错误是相同的:

INSERT OVERWRITE TABLE hvdb_as_aqua_guk_core.hvtb_aqua_guk_finding_mgn
PARTITION(my_part) ( key_1, key_2, col_1, col_2, my_part    )
SELECT 
        key_1,
        key_2,
        col_1,
        col_2,
        SUBSTR(key_2, -3) as my_part    -- not sure how to insert partition
FROM hvdb_as_aqua_guk_core.hvtb_aqua_guk_finding_mgn_temp2
WHERE key_2 IS NOT NULL
    AND SUBSTR(key_2, -3) IS NOT NULL;

You should specify explicitly all the column names you are inserting into. 您应该明确指定要插入的所有列名称。 For example, your command should be something like this: 例如,您的命令应如下所示:

INSERT OVERWRITE TABLE exampledb.exampletable
PARTITION(my_part)(key_1, key_2, col_1, col_2, my_part)
SELECT 
    key_1,
    key_2,
    col_1,
    col_2,
    SUBSTR(key_2, -3)
FROM exampledb.exampletable_temp;

This should work. 这应该工作。

UPDATE 更新

I tried to create a test case, and INSERT OVERWRITE doesn't seem to work, but INSERT INTO is working. 我试图创建一个测试用例,但是INSERT INTO INSERT OVERWRITE似乎不起作用,但是INSERT INTO起作用了。 A workaround could be to delete all data from the destination table with TRUNCATE TABLE exampledb.exampletable , or delete all data from a specific partition with TRUNCATE TABLE test6 PARTITION (my_part = '001'); 一种解决方法是使用TRUNCATE TABLE exampledb.exampletable从目标表中删除所有数据,或者使用TRUNCATE TABLE test6 PARTITION (my_part = '001');从特定分区中删除所有数据TRUNCATE TABLE test6 PARTITION (my_part = '001'); , then run an INSERT INTO : ,然后运行INSERT INTO

INSERT INTO exampledb.exampletable
PARTITION(my_part)(key_1, key_2, col_1, col_2, my_part)
SELECT
    key_1,
    key_2,
    col_1,
    col_2,
    SUBSTR(key_2, -3)
FROM exampledb.exampletable_temp;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM