简体   繁体   中英

How to enable snappy compression for all the loaded data in hive?

I have around TB's of data in my Hive warehouse, am trying to enable snappy compression for them. I know that we can enable hive compression using

hive> SET hive.exec.compress.output=true;
hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;

while loading the data into hive, But how do i compress the data which is already loaded.

Hive ORCFile supports compressed storage. To convert existing data to ORCFile, create a new table with the same schema as the source table plus stored as orc, See below:-

CREATE TABLE A_ORC ( 
    customerID int, name string, ..etc 
) STORED AS ORC tblproperties (“orc.compress" = “SNAPPY”); 

INSERT INTO A_ORC SELECT * FROM A; 

Here A_ORC is the new table and A is source table

Here you can learn more about ORCFile .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM