简体   繁体   English

在Apache Pig中读取Snappy压缩的Hive RCFile

[英]Read Snappy compressed Hive RCFile in Apache Pig

Trying to read Hive files in Pig using http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/HiveColumnarLoader.html 尝试使用http://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/HiveColumnarLoader.html读取Pig中的Hive文件

Fies have RCF , SnappyCodec and hive.io.rcfile.column.number words in its beginning, they are binary files. Fies的开头有RCFSnappyCodechive.io.rcfile.column.number字,它们是二进制文件。 Moreover they are partitioned over multiple directories (like /day=20140701 ). 此外,它们被划分为多个目录(例如/day=20140701 )。

However simple script of loading, grouping and counting rows prints nothing to output. 但是,加载,分组和计数行的简单脚本不会输出任何内容。 If I try to add "ILLUSTRATE" like this: 如果我尝试添加“ ILLUSTRATE”,如下所示:

rows = LOAD ... using HiveColumnarLoader ...;
ILLUSTRATE rows;

I get error like this: 我收到这样的错误:

2014-07-17 14:16:43,086 [main] ERROR org.apache.pig.pen.AugmentBaseDataVisitor - No (valid) input data found!
java.lang.RuntimeException: No (valid) input data found!
    at org.apache.pig.pen.AugmentBaseDataVisitor.visit(AugmentBaseDataVisitor.java:583)
    at org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:229)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.depthFirst(PreOrderDepthFirstWalker.java:82)
    at org.apache.pig.pen.util.PreOrderDepthFirstWalker.walk(PreOrderDepthFirstWalker.java:66)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:180)
    at org.apache.pig.PigServer.getExamples(PigServer.java:1180)
...

I'm not sure, whether it is because of Snappy compression or some trouble with specifying schema (I copied it from hive, describe table). 我不确定是由于Snappy压缩还是由于指定架构而遇到的麻烦(我是从hive中复制它的describe table)。

Could anyone please confirm that HiveColumnarLoader works with snappy compressed files or propose another approach? 任何人都可以确认HiveColumnarLoader是否可以处理快速压缩的文件或提出其他方法吗?

Thanks in advance! 提前致谢!

Have you tried the HCatLoader? 您是否尝试过HCatLoader?

rows = LOAD 'tablename' using org.apache.hcatalog.pig.HCatLoader(); rows =使用org.apache.hcatalog.pig.HCatLoader()加载“表名”;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM