简体   繁体   English

PIG mapreduce 输出和 HIVE

[英]PIG mapreduce output and HIVE

I have a file called test.txt with the records as below (disregard the dots):我有一个名为 test.txt 的文件,其记录如下(忽略点):

"1" "a" "x" 
"2" "b" "y" 
"3" "c" "z"

(tab as field separator) (制表符作为字段分隔符)

My pig script (test.pig):我的猪脚本(test.pig):

a=LOAD '/Analytics/warehouse/SF/test.txt' as (fullrecord:chararray);

b=FOREACH a generate REPLACE($0,'\t',',');

STORE b INTO 'hdfs://localhost:9000/Analytics/warehouse/SF/sf.out' USING PigStorage(',');

I run the script with: pig -x mapreduce test.pig我运行脚本: pig -x mapreduce test.pig

the output:输出:

.../warehouse/SF/sf.out

part-m-0000

And the content is only:而且内容只有:

"1"
"2"
"3"

Q1- What happened with the other fields? Q1- 其他领域发生了什么?

Q2- Why the tab characters wasn't changed by ","? Q2-为什么制表符没有被“,”改变?

Q3- How can i achieve the next result? Q3- 我怎样才能达到下一个结果?

"1","a","x" 
"2","b","y" 
"3","c","z"

Q4- How can i query that result with HIVE? Q4- 如何使用 HIVE 查询该结果?

What happened with the other fields其他领域发生了什么

LOAD defaults to using tab delimiter. LOAD 默认使用制表符分隔符。 Your GENERATE only grabbed the first column.您的GENERATE只抓取了第一列。 You need USING PigStorage('\n') to load the whole line.您需要USING PigStorage('\n')来加载整行。 But you can also not do this, and just remove the FOREACH , then STORE with PigStorage(',')但是您也不能这样做,只需删除FOREACH ,然后使用PigStorage(',')存储

Why the tab characters wasn't changed为什么制表符没有改变

Related to above, there is nothing to replace when you only have one element.与上述相关,当您只有一个元素时,没有什么可替换的。

How can i query that result with HIVE?如何使用 HIVE 查询该结果?

Use HCatalog, not PigStorage -使用 HCatalog,而不是 PigStorage -

STORE data_alias INTO 'tablename' USING org.apache.hcatalog.pig.HCatStorer();

Then query the table.然后查询表。

Or you would need to define a Hive External table over the HDFS data.或者您需要在 HDFS 数据上定义一个 Hive 外部表。

You can also skip using Pig altogether;您也可以完全跳过使用 Pig; Hive can query files that have tabs - STORED AS TEXT FIELDS DELIMITED BY '\t' Hive 可以查询具有选项卡的文件 - 存储为STORED AS TEXT FIELDS DELIMITED BY '\t'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM