简体   繁体   English

在Apache Pig中读取压缩(.xz)文件

[英]Reading compressed (.xz) file in Apache pig

I am trying to read .xz file compressed using hadoop-xz codec using pig script. 我正在尝试使用Pig脚本读取使用hadoop-xz编解码器压缩的.xz文件。

The sample code i tried is, 我尝试的示例代码是

REGISTER hadoop-xz-1.4.jar
SET output.compression.enabled true;
SET output.compression.codec io.sensesecure.hadoop.xz.XZCodec;

msg = LOAD 'pigtest/newXZ.xz' USING PigStorage();
STORE msg INTO 'pigtest/output' USING PigStorage();
DUMP msg;

The result is still in a compressed format. 结果仍然是压缩格式。 Am i doing wrong or i have to use XZInputStream inside pig? 我做错了还是必须在XZInputStream内使用XZInputStream

The running environment is HortonWorks Sandbox 2.2 (Hue) 运行环境为HortonWorks Sandbox 2.2(Hue)

Depends on what you want to do. 取决于您要做什么。

It seems like you want to read an XZ file so I would assume you need to setup the input codec not the output one. 似乎您想读取一个XZ文件,所以我认为您需要设置输入编解码器而不是输出编解码器。

I'm not a PIG user but from what I gather it cannot easily handle custom compression (unlike Hive and Streaming for example). 我不是PIG用户,但据我收集,它不能轻松处理自定义压缩(例如,不同于Hive和Streaming)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM