[英]Reading compressed (.xz) file in Apache pig
I am trying to read .xz file compressed using hadoop-xz codec using pig script. 我正在尝试使用Pig脚本读取使用hadoop-xz编解码器压缩的.xz文件。
The sample code i tried is, 我尝试的示例代码是
REGISTER hadoop-xz-1.4.jar
SET output.compression.enabled true;
SET output.compression.codec io.sensesecure.hadoop.xz.XZCodec;
msg = LOAD 'pigtest/newXZ.xz' USING PigStorage();
STORE msg INTO 'pigtest/output' USING PigStorage();
DUMP msg;
The result is still in a compressed format. 结果仍然是压缩格式。 Am i doing wrong or i have to use XZInputStream
inside pig? 我做错了还是必须在XZInputStream
内使用XZInputStream
?
The running environment is HortonWorks Sandbox 2.2 (Hue) 运行环境为HortonWorks Sandbox 2.2(Hue)
Depends on what you want to do. 取决于您要做什么。
It seems like you want to read an XZ file so I would assume you need to setup the input codec not the output one. 似乎您想读取一个XZ文件,所以我认为您需要设置输入编解码器而不是输出编解码器。
I'm not a PIG user but from what I gather it cannot easily handle custom compression (unlike Hive and Streaming for example). 我不是PIG用户,但据我收集,它不能轻松处理自定义压缩(例如,不同于Hive和Streaming)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.