简体繁体 English

读取猪中的快速压缩输入

[英]Reading snappy compressed input in pig

原文 2013-01-23 20:27:53 5 1 hadoop/ apache-pig/ snappy

I have a file that I am trying to load into pig that is compressed with snappy. 我有一个文件要加载到使用snappy压缩的Pig中。 I set the configuration options in grunt like was described in this jira issue but I am still getting the compressed data in the results. 我在grunt中设置了配置选项，如在此jira问题中所述，但我仍然在结果中获取压缩数据。

When I run the job it does say: org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library is available 当我运行作业时，它会说：org.apache.hadoop.io.compress.snappy.LoadSnappy-Snappy本机库可用

for the job I do a simple 为了这份工作，我做了一个简单的
a = load '/path/to/snappy/file' using PigStorage() as (x, y, z) a =使用PigStorage（）作为（x，y，z）加载'/ path / to / snappy / file'

then: 然后：
dump data 转储数据

will output the compressed data. 将输出压缩数据。

Does anyone know what I can do to read the data correctly? 有谁知道我该怎么做才能正确读取数据？ Thanks in advance. 提前致谢。

1 个解决方案

PigStorage uses PigTextInputFormat for input, which will detect and use Snappy compressed files, but the files must have the correct extension for hadoop the hadoop compression codec factory to know to use snappy. PigStorage使用PigTextInputFormat作为输入，它将检测并使用Snappy压缩文件，但是文件必须具有hadoop的正确扩展名hadoop压缩编解码器工厂才能知道要使用snappy。

My guess is your files don't have the .snappy extension, try renaming the files and trying again 我的猜测是您的文件没有.snappy扩展名，请尝试重命名文件并重试

在Apache Pig中读取Snappy压缩的Hive RCFile - Read Snappy compressed Hive RCFile in Apache Pig

读取Snappy压缩文件时出错 - Error while reading Snappy compressed file

在Apache Pig中读取压缩（.xz）文件 - Reading compressed (.xz) file in Apache pig

将快速压缩的数据写入配置单元表 - Writing snappy compressed data to a hive table

将snappy压缩文件加载到Elastic MapReduce中 - Load snappy-compressed files into Elastic MapReduce

如何加载在 HIVE 中压缩的 json snappy - How to load json snappy compressed in HIVE

从Hadoop流中读取HDFS上的快照压缩数据 - Read Snappy Compressed data on HDFS from Hadoop Streaming

如何在Java中从S3读取Snappy压缩文件 - How to read Snappy Compressed file from S3 in Java

HDFS上的快照压缩文件显示为无扩展名，不可读 - Snappy compressed file on HDFS appears without extension and is not readable

解码的Snappy压缩字节数组的结尾为零 - Decoded Snappy compressed byte arrays have trailing zeros

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在Apache Pig中读取Snappy压缩的Hive RCFile - Read Snappy compressed Hive RCFile in Apache Pig 读取Snappy压缩文件时出错 - Error while reading Snappy compressed file 在Apache Pig中读取压缩（.xz）文件 - Reading compressed (.xz) file in Apache pig 将快速压缩的数据写入配置单元表 - Writing snappy compressed data to a hive table 将snappy压缩文件加载到Elastic MapReduce中 - Load snappy-compressed files into Elastic MapReduce 如何加载在 HIVE 中压缩的 json snappy - How to load json snappy compressed in HIVE 从Hadoop流中读取HDFS上的快照压缩数据 - Read Snappy Compressed data on HDFS from Hadoop Streaming 如何在Java中从S3读取Snappy压缩文件 - How to read Snappy Compressed file from S3 in Java HDFS上的快照压缩文件显示为无扩展名，不可读 - Snappy compressed file on HDFS appears without extension and is not readable 解码的Snappy压缩字节数组的结尾为零 - Decoded Snappy compressed byte arrays have trailing zeros

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM