简体   繁体   English

是否可以更改 HIVE 中的分区元数据?

[英]Is it possible to change partition metadata in HIVE?

This is an extension of a previous question I asked: How to compare two columns with different data type groups这是我之前提出的问题的扩展: 如何比较具有不同数据类型组的两列

We are exploring the idea of changing the metadata on the table as opposed to performing a CAST operation on the data in SELECT statements.我们正在探索更改表上的元数据的想法,而不是对 SELECT 语句中的数据执行 CAST 操作。 Changing the metadata in the MySQL metastore is easy enough.更改 MySQL 元存储中的元数据非常简单。 But, is it possible to have that metadata change applied to partitions (they are daily)?但是,是否可以将元数据更改应用于分区(它们是每天的)? Otherwise, we might be stuck with current and future data being of type BIGINT while the historical is STRING.否则,我们可能会遇到当前和未来数据为 BIGINT 类型而历史为 STRING 的情况。

Question: Is it possible to change partition meta data in HIVE?问:HIVE中的分区元数据是否可以更改? If yes, how?如果是,如何?

You can change partition column type using this statement:您可以使用以下语句更改分区列类型:

alter table {table_name} partition column ({column_name} {column_type});

Also you can re-create table definition and change all columns types using these steps:您还可以使用以下步骤重新创建表定义并更改所有列类型:

  1. Make your table external, so it can be dropped without dropping the data使您的表在外部,因此可以在不删除数据的情况下删除它

    ALTER TABLE abc SET TBLPROPERTIES('EXTERNAL'='TRUE');

  2. Drop table (only metadata will be removed).删除表(仅删除元数据)。

  3. Create EXTERNAL table using updated DDL with types changed and with the same LOCATION.使用更新的 DDL 创建 EXTERNAL 表,其中类型已更改且具有相同的 LOCATION。
  4. recover partitions:恢复分区:

    MSCK [REPAIR] TABLE tablename;

The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is: Amazon Elastic MapReduce (EMR) 的 Hive 版本的等效命令是:

ALTER TABLE tablename RECOVER PARTITIONS;

This will add Hive partitions metadata.这将添加 Hive 分区元数据。 See manual here: RECOVER PARTITIONS请参阅此处的手册: 恢复分区

  1. And finally you can make you table MANAGED again if necessary:最后,如有必要,您可以再次使您的表 MANAGED:

ALTER TABLE tablename SET TBLPROPERTIES('EXTERNAL'='FALSE');

Note: All commands above should be ran in HUE, not MySQL.注意:以上所有命令都应该在 HUE 中运行,而不是 MySQL。

You can not change the partition column in hive infact Hive does not support alterting of partitioning columns您不能更改 hive 事实上 Hive 中的分区列不支持更改分区列

Refer: altering partition column type in Hive参考:更改 Hive 中的分区列类型

You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is not possible exp if you have partitioned on year this is how directory structure looks like You can think of it this way - Hive stores the data by creating a folder in hdfs with partition column values - Since if you trying to alter the hive partition it means you are trying to change the whole directory structure and data of hive table which is如果您已分区,则不可能 exp 这就是目录结构的样子

tab1/clientdata/2009/file2
tab1/clientdata/2010/file3

If you want to change the partition column you can perform below steps如果要更改分区列,可以执行以下步骤

  1. Create another hive table with required changes in partition column创建另一个 hive 表,在分区列中进行所需的更改

    Create table new_table ( A int, B String.....)创建表 new_table ( A int, B String .....)

  2. Load data from previous table从上一个表中加载数据

    Insert into new_table partition ( B ) select A,B from table Prev_table从表 Prev_table 插入 new_table 分区 ( B ) select A,B

After I changed the Avro(avsc) schema (see below), I was able to "fix" the (already existing) partition by doing "ADD PARTITION" as per this site:在我更改了 Avro(avsc) 架构(见下文)之后,我能够通过按照此站点执行“添加分区”来“修复”(已经存在的)分区:

http://hadooptutorial.info/partitioning-in-hive/ http://hadooptutorial.info/partitioning-in-hive/

ALTER TABLE partitioned_user ADD PARTITION (country = 'US', state = 'CA')
LOCATION '/hive/external/tables/user/country=us/state=ca'

I changed the avro schema by doing a sqoop from MySQL (either alter the field in MySQL or CAST() in the SELECT) - this modified the avsc file.我通过从 MySQL 执行 sqoop 更改了 avro 模式(更改 MySQL 中的字段或 SELECT 中的CAST() ) - 这修改了 avsc 文件。

I had done multiple things before doing the ADD PARTITION - I had done DROP/CREATE/MSCK TABLE - so I'm not sure if they are or aren't needed (but they hadn't fixed the partition).在执行ADD PARTITION之前我已经做了很多事情——我已经完成了DROP/CREATE/MSCK TABLE所以我不确定它们是否需要(但他们没有修复分区)。

Simple.简单的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM