简体   繁体   English

将快速压缩的数据写入配置单元表

[英]Writing snappy compressed data to a hive table

I've created a hive table and now I want to load snappy compressed data into the table. 我已经创建了一个配置单元表,现在我想将快速压缩的数据加载到表中。 Therefore I did the following: 因此,我做了以下工作:

SET mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET hive.exec.compress.output=true;
SET mapreduce.output.fileoutputformat.compress=true;
CREATE TABLE toydata_table (id STRING, value STRING)  ROW FORMAT DELIMITED FIELDS TERMINATED BY ",";'

Then I created as CSV file called toydata.csv that has the following content: 然后,我创建了一个名为toydata.csv的CSV文件,其内容如下:

A,Value1
B,Value2
C,Value3

I compressed this file with snzip ( https://github.com/kubo/snzip ) by doing 我通过使用snzip( https://github.com/kubo/snzip )压缩了此文件

/usr/local/bin/snzip -t snappy-java toydata.csv

which produces toydata.csv.snappy . 产生toydata.csv.snappy After having done this I returned to the hive cli and loaded the data by LOAD DATA LOCAL INPATH "toydata.csv.snappy" INTO TABLE toydata_table; 完成此操作后,我返回到蜂巢cli并通过LOAD DATA LOCAL INPATH "toydata.csv.snappy" INTO TABLE toydata_table;加载了数据LOAD DATA LOCAL INPATH "toydata.csv.snappy" INTO TABLE toydata_table; . But now I want to try to query from that table and get the following error message: 但是现在我想尝试从该表查询并获得以下错误消息:

hive> select * from toydata_table;
OK
Exception in thread "main" java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
    at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method)
    at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:62)
    at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:189)
    at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:175)
    at org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
    at org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:433)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:515)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:489)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:136)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1471)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:271)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:216)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:614)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

I did the exact same thing with gzip and using gzip works fine. 我对gzip做了完全相同的事情,使用gzip可以正常工作。 So, why does this part fail? 那么,为什么这部分会失败?

Please install snappy compression codec on your cluster.If you want to confirm whether snappy is installed please find libsnappy.so file in your libraries. 请在群集上安装snappy压缩编解码器。如果要确认是否已安装snappy,请在库中找到libsnappy.so文件。 Also you need to start hive shell with --auxpath parameter and provide snappy.jar.eg: hive --auxpath /home/user/snappy1.0.4.1.jar. 另外,您还需要使用 --auxpath参数启动hive shell并提供snappy.jar.eg:hive --auxpath /home/user/snappy1.0.4.1.jar。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM