蜂房，分區表的存儲桶

Question

這是我的腳本：

--table without partition

drop table if exists ufodata;
create table ufodata ( sighted string, reported string, city string, shape string, duration string, description string )
row format delimited
fields terminated by '\t'
Location '/mapreduce/hive/ufo';

--load my data in ufodata

load data local inpath '/home/training/downloads/ufo_awesome.tsv' into table ufodata;

--create partition table
drop table if exists partufo;
create table partufo ( sighted string, reported string, city string, shape string, duration string, description string )
partitioned by ( year string )
clustered by (year) into 6 buckets
row format delimited
fields terminated by '/t';

--by default dynamic partition is not set
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
--by default bucketing is false
set hive.enforcebucketing=true;

--loading mydata
insert overwrite table partufo
partition (year)
select sighted, reported, city, shape, min, description, SUBSTR(TRIM(sighted), 1,4) from ufodata;

錯誤信息：

失敗：語義分析中的錯誤：無效的列引用

我嘗試為分區表進行存儲。 如果我刪除“按（年份）分成6個存儲桶”，則腳本可以正常工作。 如何存儲分區表

Answer 1

在配置單元中進行存儲桶操作時，有一件重要的事情要考慮。

相同的列名稱不能同時用於存儲區和分區。 原因如下：

群集和排序發生在分區內。 在每個分區內，只有一個與分區列關聯的值（在您的情況下為年份），因此不會對聚類和排序產生任何影響。 那就是你出錯的原因。

Answer 2

您可以使用以下語法創建具有分區的存儲表。

CREATE TABLE bckt_movies
(mov_id BIGINT , mov_name STRING ,prod_studio STRING, col_world DOUBLE , col_us_canada DOUBLE , col_uk DOUBLE , col_aus DOUBLE)
PARTITIONED BY (rel_year STRING)
CLUSTERED BY(mov_id) INTO 6 BUCKETS;

Answer 3

在進行動態分區時，請創建一個包含所有列（包括分區列）的臨時表，並將數據加載到臨時表中。

使用分區列創建實際的分區表。 從臨時表加載數據時，分區列應位於select子句的最后。

蜂房，分區表的存儲桶

問題描述

3 個解決方案

解決方案1
1 2015-10-15 10:30:00

解決方案2
0 2015-10-15 07:16:54

解決方案3
0 2016-07-15 23:05:46

蜂房，分區表的存儲桶

問題描述

3 個解決方案

解決方案1 1 2015-10-15 10:30:00

解決方案2 0 2015-10-15 07:16:54

解決方案3 0 2016-07-15 23:05:46

解決方案1
1 2015-10-15 10:30:00

解決方案2
0 2015-10-15 07:16:54

解決方案3
0 2016-07-15 23:05:46