简体   繁体   English

使用字符串数组在Hive表上加载CSV文件

[英]Loading CSV file on Hive Table with String Array

I am trying to insert a CSV File into Hive with one field being array of string . 我正在尝试将CS​​V文件插入到Hive中,其中一个字段是字符串数组。

Here is the CSV File : 这是CSV文件:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,[Health&Fitness,Travel]
99,Snacks that Power Up Weight Loss,Aidan B. Prince,[Photo,Travel]

I tried creating table something like this : 我尝试创建这样的表:

CREATE TABLE IF NOT EXISTS Article
(
ARTICLE_ID INT,
ARTICLE_NSAME STRING,
ARTICLE_AUTHOR STRING,
ARTICLE_GENRE ARRAY<STRING>
);
LOAD DATA INPATH '/tmp/pinterest/article.csv' OVERWRITE INTO TABLE Article;
select * from Article;  

Here is output what I get : 这是我得到的输出:

article.article_id  article.article_name    article.article_author  article.article_genre
48  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Health&Fitness"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["[Photo"]

Its taking only one value in last field article_genre . 它在最后一个字段article_genre中只取一个值。

Can someone point out what wrong here ? 谁能指出这里有什么问题?

Couple of stuff : 几个东西:
You are missing definition for delimiter for collection items. 您缺少集合项的分隔符定义。
Also , I assume you expect you select * from article statement to return like below : 另外,我假设您希望you select * from article语句中you select * from article返回如下:

48  Snacks that Power Up Weight Loss    Aidan B. Prince ["Health&Fitness","Travel"]
99  Snacks that Power Up Weight Loss    Aidan B. Prince ["Photo","Travel"]

I can give you an example and rest you can fiddle with it . 我可以给你一个例子,休息你可以摆弄它。 Here is my table definition : 这是我的表定义:

create table article (
  id int,
  name string,
  author string,
  genre array<string>
)
row format delimited
fields terminated by ','
collection items terminated by '|';

And here is the data : 这是数据:

48,Snacks that Power Up Weight Loss,Aidan B. Prince,Health&Fitness|Travel
99,Snacks that Power Up Weight Loss,Aidan B. Prince,Photo|Travel

Now do a load like : 现在做一个负载:
LOAD DATA local INPATH '/path' OVERWRITE INTO TABLE article; and do select statement to check the result. 并选择语句来检查结果。

Most important point : 最重要的一点
define delimiter for collection items and don't impose the array structure you do in normal programming. 定义集合项的分隔符,不要强加你在正常编程中执行的数组结构。
Also, try to make the field delimiters different from collection items delimiters to avoid confusion and unexpected results. 此外,尝试使字段分隔符与集合项分隔符不同,以避免混淆和意外结果。

In order to insert array of string in Hive table , we need to take care of below point. 为了在Hive表中插入字符串数组,我们需要注意以下几点。

 1. While creating Hive table.Collection items should be terminated by "," ('colelction.delim'=',',)
 2. Data should be like that in CSV file
  48  Snacks that Power Up Weight Loss    Aidan B. Prince Health&Fitness,Travel
You can modify file  by running below SED commands in follwing order:
 - sed -i 's/\[\"//g' filename
 - sed -i 's/\"\]//g' filename
 - sed -i 's/"//g' filename

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM