简体   繁体   中英

How to store pig output to the hive table?

I have HDInsight cluster on Azure and .csv files in hdfs (Azure storage).

Using apache-pig I want to process these files and store the output in a hive table. To achieve this I have written following script:

A = LOAD '/test/input/t12007.csv' USING PigStorage(',') AS (year:chararray,ArrTime:chararray,DeptTime:chararray);
describe A;
dump A;
store A into 'testdb.tbl3' using org.apache.hive.hcatalog.pig.HCatStorer();

This script loads the file successfully, describe the structure and it also displays the data using dump but while store command executes it throws the following error:

2017-05-02 06:18:41,476 [main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.]
Failed to parse: Pig script failed to parse: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
Caused by: <file script.pig, line 4, column 33> pig script failed to validate: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 
2017-05-02 06:18:41,484 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve org.apache.hive.hcatalog.pig.HCatStorer using imports: [, java.lang., org.apache.pig.builtin., org.apache.pig.impl.builtin.] 

pig -useHCatalog

From the Pig HCatalog documentation

Running Pig with HCatalog

Pig does not automatically pick up HCatalog jars. To bring in the necessary jars, you can either use a flag in the pig command or set the environment variables PIG_CLASSPATH and PIG_OPTS as described below. To bring in the appropriate jars for working with HCatalog , simply include the following flag in your script:

Alternate way:

Specify the location of the HCatalog jar and add a REGISTER statement with the path of the jar to the top of your script as below.

REGISTER /usr/username/client/lib/hive-hcatalog-core-1.2.1.2.3.0.0-2557.jar;

Your path may be different as per installation in your cluster. You can find this jar location using command: locate *hcatalog-core*

HCatStorer

HCatStorer is used with Pig scripts to write data to HCatalog-managed tables.

Usage

HCatStorer is accessed via a Pig store statement.

STORE A INTO 'tablename'
   USING org.apache.hive.hcatalog.pig.HCatStorer();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM