[英]Hive - updating a partition column
我有一个按日期和产品类型分区的Hive表
product_id, sale_id, date, product_type
42342423, 43423, 2017-01-01, S
67867868, 23233, 2017-01-01, C
53453466, 63423, 2017-02-01, S
我需要将product_type的所有值从'S'更新为'T'(Shirts to Tops)。 由于我们的Hive版本不支持,因此无法直接更新。
像这样发布的其他解决方案涉及创建一个新表并使用insert overwrite
和case
语句 - 类似于
INSERT OVERWRITE TABLE data.textile_sales PARTITION(date='2017-01-01')
select product_id, sale_id, case when product_type = 'S' then 'T' end as product_type, date
但是如果要更新的列是分区,这将不起作用。
有没有其他方法可以解决这个问题?
分区列“data”实际上是与目录相关的元数据。
如果您已经有'T'文件夹,则将文件从当前日期+ product_type ='S'文件夹移动到相应的日期+ product_type ='T'文件夹。
如果您没有'T'文件夹,只需重命名'S'文件夹并更新分区列表即可。
演示
hive> select * from product;
OK
67867868 23233 2017-01-01 C
42342423 43423 2017-01-01 S
53453466 63423 2017-01-02 S
[training@localhost ~]$ hdfs dfs -ls -R /user/hive/warehouse/product
drwxrwxrwx - training hive 0 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=S
-rwxrwxrwx 1 training hive 15 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=S/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=S
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=S/000000_0
[training@localhost ~]$ hdfs dfs -mkdir /user/hive/warehouse/product/date=2017-01-01/product_type=T
[training@localhost ~]$ hdfs dfs -mkdir /user/hive/warehouse/product/date=2017-01-02/product_type=T
[training@localhost ~]$ hdfs dfs -mv /user/hive/warehouse/product/date=2017-01-01/product_type=S/000000_0 /user/hive/warehouse/product/date=2017-01-01/product_type=T/000000_0
[training@localhost ~]$ hdfs dfs -mv /user/hive/warehouse/product/date=2017-01-02/product_type=S/000000_0 /user/hive/warehouse/product/date=2017-01-02/product_type=T/000000_0
[training@localhost ~]$ hdfs dfs -ls -R /user/hive/warehouse/product
drwxrwxrwx - training hive 0 2017-01-10 13:41 /user/hive/warehouse/product/date=2017-01-01
drwxrwxrwx - training hive 0 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-01/product_type=C/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-01/product_type=S
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-01/product_type=T
-rwxrwxrwx 1 training hive 15 2017-01-10 13:35 /user/hive/warehouse/product/date=2017-01-01/product_type=T/000000_0
drwxrwxrwx - training hive 0 2017-01-10 13:41 /user/hive/warehouse/product/date=2017-01-02
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-02/product_type=S
drwxrwxrwx - training hive 0 2017-01-10 13:42 /user/hive/warehouse/product/date=2017-01-02/product_type=T
-rwxrwxrwx 1 training hive 15 2017-01-10 13:36 /user/hive/warehouse/product/date=2017-01-02/product_type=T/000000_0
hive> msck repair table product;
OK
Partitions not in metastore: product:date=2017-01-01/product_type=T product:date=2017-01-02/product_type=T
Repair: Added partition to metastore product:date=2017-01-01/product_type=T
Repair: Added partition to metastore product:date=2017-01-02/product_type=T
Time taken: 0.409 seconds, Fetched: 3 row(s)
hive> select * from product;
OK
67867868 23233 2017-01-01 C
42342423 43423 2017-01-01 T
53453466 63423 2017-01-02 T
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.