简体   繁体   中英

Create hive table from file stored in hdfs in orc format

I want to know if its possible create a hive table from a file stored in hadoop file system (users.tbl) in ORC format. I read that ORC format its better than text in terms of optimization. So I would like to know if its possible create a hive table using stored as orc tblproperties and location attributes to create a table from the hdfs file but in orc format.

Something as:

create table if not exists users
(USERID BIGINT,
 NAME STRING,
 EMAIL STRING,
 CITY STRING)
STORED AS ORC TBLPROPERTIES ("orc.compress"="SNAPPY")
LOCATION '/tables/users/users.tbl';

Insted of text:

create table if not exists users
    (USERID BIGINT,
     NAME STRING,
     EMAIL STRING,
     CITY STRING)
     ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' STORED AS TEXTFILE 
     LOCATION '/tables/users/users.tbl';

You can not do that in only one step. The create table statement doesn't process the data, just specify the format and the location.

My suggestion is that you create a temporal table using the "STORED AS TEXTFILE" create statement and create the final table using ORC as storage format (using an empty location).

Then insert in the "ORC table" all the rows from the temporal "text table".

Insert [overwrite] table orcTable select col1, col2 from textTable;

Insert Overwrite will replace all the data in the table with the new data. If you only want to add new data you will use "Insert table . . ."

After the import you could delete the temporal "text table".

1.Create a table in hive.

 create table MyDB.TEST (
 Col1 String,
 Col2 String,
 Col3 String,
 Col4 String)
 STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
 OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

2. Load data to the table.

 LOAD DATA INPATH '/hdfs/dir/folder/to/orc/files/' INTO TABLE MyDB.TEST;

just create your table on existing data like below

CREATE EXTERNAL TABLE mytable
(
col1 bigint,
col2 bigint
) 
STORED AS ORC
location '<ORC File location';

Please refer this Link

https://community.hortonworks.com/questions/179897/hive-table-creation-from-orc-format-file.html

如何在您的位置之上创建您的表,并使用msck repair table table_name ,这样您的数据将被加载到您的表中以备查询。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM