I have around TB's of data in my Hive warehouse, am trying to enable snappy compression for them. I know that we can enable hive compression using
hive> SET hive.exec.compress.output=true;
hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
while loading the data into hive, But how do i compress the data which is already loaded.
Hive ORCFile supports compressed storage. To convert existing data to ORCFile, create a new table with the same schema as the source table plus stored as orc, See below:-
CREATE TABLE A_ORC (
customerID int, name string, ..etc
) STORED AS ORC tblproperties (“orc.compress" = “SNAPPY”);
INSERT INTO A_ORC SELECT * FROM A;
Here A_ORC is the new table and A is source table
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.