简体   繁体   中英

storing pig output into Hive table in a single instance

I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wherein I first store into a file, then read that file from Hive and then insert into tables. I need to reduce that extra hop which is done.

Is it possible. If so please tell me how this can be done ?

Thanks

Ok. Create a external hive table with a schema layout somewhere in HDFS directory. Lets say

create external table emp_records(id int,
                                  name String,
                                  city String)
                                  row formatted delimited 
                                  fields terminated by '|'
                                  location '/user/cloudera/outputfiles/usecase1';

Just create a table like above and no need to load any file into that directory.

Now write a Pig script that we read data for some input directory and then when you store the output of that Pig script use as below

A =  LOAD 'inputfile.txt' USING PigStorage(',') AS(id:int,name:chararray,city:chararray);
B = FILTER A by id > = 678933;
C = FOREACH B GENERATE id,name,city;
STORE C INTO '/user/cloudera/outputfiles/usecase1' USING PigStorage('|');

Ensure that destination location and delimiter and schema layout of final FOREACH statement in you Pigscript matches with Hive DDL schema.

There are two approaches explained below with 'Employee' table example to store pig output into hive table. (Prerequisite is that hive table should be already created)

A =  LOAD 'EMPLOYEE.txt' USING PigStorage(',') AS(EMP_NUM:int,EMP_NAME:chararray,EMP_PHONE:int);

Approach 1: Using Hcatalog

// dump pig result to Hive using Hcatalog 
store A into 'Empdb.employee' using org.apache.hive.hcatalog.pig.HCatStorer();

(or)

Approach 2: Using HDFS physical location

// dump pig result to external hive warehouse location
STORE A INTO 'hdfs://<<nmhost>>:<<port>>/user/hive/warehouse/Empdb/employee/' USING PigStorage(',')

;

you can store it using Hcatalog

STORE D INTO 'tablename' USING org.apache.hive.hcatalog.pig.HCatStorer();

see below link https://acadgild.com/blog/loading-and-storing-hive-data-into-pig

The best way is to use HCatalog and write the data in hive table.

STORE final_data INTO 'Hive_table_name' using org.apache.hive.hcatalog.pig.HCatStorer();

But before storing the data, make sure the columns in the 'final_data' dataset is perfectly matched and mapped with the schema of the table.

And run your pig script like this : pig script.pig -useHCatalog

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM