[英]how to add partition to hive table with nested data?
我正在將日志從S3加載到Hive中
CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ;
我的數據看起來像這樣
{
"col1": {
"country": "India",
"page": "/signup",
"date": "2018-01-01"
}
}
如果要在col1.country,col1.page,col1.date上創建分區,如何在創建語句中包括該分區,則嘗試使用colName.fieldName,但沒有成功。
您可以直接嘗試而無需提及列名,如下所示
CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
partitioned by (country string, page string, date string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ;
請注意,外部表不會直接檢測分區,您必須更改並添加分區,如下所示:
ALTER TABLE logs ADD PARTITION (country=india, pager=whatever, date=whatever) location '/hdfs/path/';
#You might also need to repair the table at the end
msck repair table schemaName.tableName
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.