简体   繁体   English

在Hive的存储桶表中增量加载数据?

[英]Loading data incrementally in bucketed table in hive?

I am still learning hive. 我还在学习蜂巢。 I have referred few books for understanding concept of Buckets in hive. 为了理解蜂箱中的“桶”的概念,我只推荐了几本书。 What i learnt is , if we enforce bucketing , it will create exactly same number of files as number of buckets. 我了解到的是,如果我们执行存储桶,它将创建与存储桶数量完全相同的文件数。

In my Case, i will load the data incrementally in bucketed table five times a day. 就我而言,我每天将数据增量加载到存储桶表中五次。 For ex: if i have table with 16 buckets, then each load it will create 16 files based on hash/samples. 例如:如果我有16个存储桶的表,那么每次加载时,它都会基于哈希/样本创建16个文件。 So totally for 5 runs , 80 files will created. 因此,总共进行5次运行,将创建80个文件。

My Question is , if i have table with 16 buckets defined on it with 80 files 
in HDFS, will it going to give bucketing benefits ?

do you create a different table for each incremental loading? 您是否为每个增量加载创建不同的表?

which Hadoop distribution are you using? 您正在使用哪个Hadoop发行版?

I'm using that same strategy and every incremental loading generates (and override) the same number of buckets that i defined. 我正在使用相同的策略,每个增量加载都会生成(并覆盖)我定义的相同数量的存储桶。

When we have permissions problems have duplicated files, because hive tables were created with hive user and populating was made with another (hdfs). 当我们拥有权限问题时,就有重复的文件,这是因为配置单元表是由配置单元用户创建的,而填充是由另一个配置(hdfs)进行的。

Look at in your /user/hive/warehouse directory for the owner/permissions of table directory , afterward look at in the subdirectories for the same (owner/permissions) 在/ user / hive / warehouse目录中查找表目录的所有者/权限,然后在子目录中查找相同的目录(所有者/权限)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM