简体   繁体   English

带有分区的外部 Hive 表不读取文件

[英]External Hive Table with Partition not reading the file

I have created the external hive table like below and tried to read the file in the location.我创建了如下所示的外部配置单元表并尝试读取该位置中的文件。

sample input:样本输入:

c1,c2,c3,c4,c5
ass,adda,ada,er,asa
asdasd,asd,asas,qwqw,dfdf

Extended table with partition带分区的扩展表

create external table tablename(field3 varchar(50), field4 varchar(50), filed5 varchar(50))
partitioned by (field1 varchar(50), field2 varchar(50))
ROW FORMAT DELIMITED
fields terminated by ','
lines terminated by '\n'
location '/path/to/Folder/'
tblproperties ("skip.header.line.count"="1");

Only one csv file in the folder location which has 5 columns and 1k rows文件夹位置只有一个 csv 文件,它有 5 列和 1k 行

After creating the table when I tried to run select query, no results is showing.在我尝试运行select查询时创建表后,没有显示任何结果。

Then I tried creating the external table without partitioning like below and getting output when I run select query.然后我尝试创建外部表不像下面这样分区,并在运行select查询时获取输出。

create external table tablename(field1 varchar(50), field2 varchar(50),field3 varchar(50), field4 varchar(50), filed5 varchar(50))
    ROW FORMAT DELIMITED
    fields terminated by ','
    lines terminated by '\n'
    location '/path/to/Folder/'
    tblproperties ("skip.header.line.count"="1");

I dont know where I am doing mistake.我不知道我在哪里做错了。 I am pretty new to hive.我对蜂巢很陌生。 So kindly help me.所以请帮助我。

As I know, when you load data from other non-hive data/table to hive, hive takes the order of the fields as it is in source data.据我所知,当您将其他非 hive 数据/表中的数据加载到 hive 时,hive 会采用源数据中的字段顺序。 So if hive table is partitioned only the last columns in source data can be use as partition.因此,如果对 hive 表进行了分区,则只能将源数据中的最后一列用作分区。

In your case I am not sure why you are not getting any output, although the output would be wrong as field1 will be field4 and field2 will be field5 in your partitioned table.在您的情况下,我不确定为什么您没有得到任何输出,尽管输出会是错误的,因为您的分区表中的field1将是field4field2将是field5

The only indirect way (not good) I know is to create the non-partitioned table first as you created, then copy the data from non-partitioned table to partitioned table.我知道的唯一间接方法(不好)是先创建非分区表,然后将数据从非分区表复制到分区表。 If it eats lot of space (although you are going to delete the non-partitioned table later), then you need to change your source data, I guess, to get the partition fields.如果它占用了大量空间(尽管稍后您将删除未分区的表),那么我猜您需要更改源数据以获取分区字段。

使用以下命令Msck repair table <db_name>.<table_name> ,如果有关表的元数据尚不存在,则会将有关 hive 表的元数据添加到 hive Metastore。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM