简体   繁体   English

读取猪中的快速压缩输入

[英]Reading snappy compressed input in pig

I have a file that I am trying to load into pig that is compressed with snappy. 我有一个文件要加载到使用snappy压缩的Pig中。 I set the configuration options in grunt like was described in this jira issue but I am still getting the compressed data in the results. 我在grunt中设置了配置选项,如在此jira问题中所述,但我仍然在结果中获取压缩数据。

When I run the job it does say: org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library is available 当我运行作业时,它会说:org.apache.hadoop.io.compress.snappy.LoadSnappy-Snappy本机库可用

for the job I do a simple 为了这份工作,我做了一个简单的
a = load '/path/to/snappy/file' using PigStorage() as (x, y, z) a =使用PigStorage()作为(x,y,z)加载'/ path / to / snappy / file'

then: 然后:
dump data 转储数据

will output the compressed data. 将输出压缩数据。

Does anyone know what I can do to read the data correctly? 有谁知道我该怎么做才能正确读取数据? Thanks in advance. 提前致谢。

PigStorage uses PigTextInputFormat for input, which will detect and use Snappy compressed files, but the files must have the correct extension for hadoop the hadoop compression codec factory to know to use snappy. PigStorage使用PigTextInputFormat作为输入,它将检测并使用Snappy压缩文件,但是文件必须具有hadoop的正确扩展名hadoop压缩编解码器工厂才能知道要使用snappy。

My guess is your files don't have the .snappy extension, try renaming the files and trying again 我的猜测是您的文件没有.snappy扩展名,请尝试重命名文件并重试

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM