蜂巢：由于目标表已分区，因此需要指定分区列

Question

我不知道是否有可能在蜂巢中插入一个非分区表成被分区之一。 第一个表如下：

hive> describe extended user_ratings;
OK
userid                  int                                         
movieid                 int                                         
rating                  int                                         
unixtime                int                                         

Detailed Table Information  Table(tableName:user_ratings, dbName:ml, owner:cloudera, createTime:1500142667, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/ml.db/user_ratings, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=    , field.delim=
Time taken: 0.418 seconds, Fetched: 6 row(s)

因此，新表为：

hive> describe extended rating_buckets;
OK
userid                  int                                         
movieid                 int                                         
rating                  int                                         
unixtime                int                                         
genre                   string                                      

# Partition Information      
# col_name              data_type               comment             

genre                   string                                      

Detailed Table Information  Table(tableName:rating_buckets, dbName:default, owner:cloudera, createTime:1500506879, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null), FieldSchema(name:movieid, type:int, comment:null), FieldSchema(name:rating, type:int, comment:null), FieldSchema(name:unixtime, type:int, comment:null), FieldSchema(name:genre, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/user/hive/warehouse/rating_buckets, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:8, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=  , field.delim=
Time taken: 0.46 seconds, Fetched: 12 row(s)

似乎正在将分区（“类型”）视为与其他列相同...我可能创建错了吗？

无论如何，这是当我尝试对新表执行INSERT OVERWRITE时发生的情况：

hive> FROM ml.user_ratings
    > INSERT OVERWRITE TABLE rating_buckets
    > select userid, movieid, rating, unixtime;
FAILED: SemanticException 2:23 Need to specify partition columns because the destination table is partitioned. Error encountered near token 'rating_buckets'

我应该只重新创建带有分区的第一个表吗？ 有没有办法复制第一个表并保持分区不变？

Answer 1

您甚至没有在选择列表中包括流派。 我认为它必须排在最后。 您不能一无所有。

您还需要使用表指定分区，如下所示：

insert overwrite table ratings_buckets partition(genre)
select
userid,
movieid,
rating,
unixtime,
<SOMETHING> as genre
from
...

蜂巢：由于目标表已分区，因此需要指定分区列

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-07-20 03:54:47

蜂巢：由于目标表已分区，因此需要指定分区列

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-07-20 03:54:47

解决方案1
3 已采纳 2017-07-20 03:54:47