简体   繁体   中英

ORC fileformat with Impala

Can ORC fileformat be used in Impala? Also how to access ORC table stored in hive metastore in Impala. Found below documentation link, but it doesn't contain any restricted fileformats list or mention of ORC not supported with impala: http://www.cloudera.com/documentation/enterprise/latest/topics/impala_file_formats.html

ORC is not supported in Impala. Rather, Apache Parquet is the recommend format for best performance.

Impala cannot read ORC file format. If you have the possibility, I would suggest to migrate your ORC files to PARQUET with Hive. The advantage is that you are paying just one the time of setting up map-reduce tasks.

If your ORC table is nameoforctable, the a very basic query looks like:

CREATE TABLE nameoforctable_parquet
LIKE nameoforctable
STORED AS PARQUET
LOCATION '/your/hdfs/location';

INSERT INTO nameoforctable_parquet 
SELECT * FROM nameoforctable

Even though ORC is the only format to support ACID feature in Hive and demonstrated better query performance and compression ratio in some benchmarking studies, Impala doesn't support the ORC file format because it was created by Hortonworks, who is one of their major competitors. Vice versa, the Hive version on Hortonworks Data Platform (HDP) does not support Parquet for the same reason.

使用follow命令在impala中创建orc格式表:

create table orc_table_name_1 (x INT, y STRING) STORED AS orc;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM