简体   繁体   中英

Is there are way to load partial table from hive to pig relation?

I am currently loading a hive table to pig relation using below code.

a = LOAD 'hive_db.hive_table' using org.apache.hive.hcatalog.pig.HCatLoader();

This step would get all the records from hive table into pig but for my current scenario I wouldn't need the whole table in pig. Is there way to filter out the unwanted records while I get the data from hive?

No you can't load partial table.However you can filter it after the load statement.You can use filter for specific partitions or filter out records based on column values in the table loaded.

Examples here

If your Hive table is partitioned, you can load only certain partitions by doing a FILTER statement immediately after your LOAD statement.

From the documentation :

If only some partitions of the specified table are needed, include a partition filter statement immediately following the load statement in the data flow. (In the script, however, a filter statement might not immediately follow its load statement.) The filter statement can include conditions on partition as well as non-partition columns.

A = LOAD 'tablename' USING  org.apache.hive.hcatalog.pig.HCatLoader();
-- date is a partition column; age is not
B = filter A by date == '20100819' and age < 30;

The above will only load the partition date == '20100819' . This only works for partition columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM