简体   繁体   中英

Storing Data in Hbase using pig

I am running a pig script to store data. I have code like this:

TOP = foreach GROUPED_DATA {
    SORTED = order WEIGHTED_DATA BY review_weight DESC;
    best_review = limit SORTED 1;
    generate group as businessid, flatten (best_review); 
    } 

This code gives me the highest rated review for each business and I get returned a tuple like this:

ID,      weight,  ID,   user_id, count
(zzxb0Y , 34.2, zzxb0Y, dVK7EAJd, 5 )

I am trying to store this in hbase using the code:

STORE TOP INTO 'hbase://sample_data' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(sample_col:weight, sample_col:user_id, sample_col:count);

I get an index out of bound error:

java.lang.Exception: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
`Caused by: java.io.IOException: java.lang.IndexOutOfBoundsException: Index: 3, Size: 3

I want to store data in Hbase in which ID is the key and for each ID, in the column_family I store the three values. Please tell me how to do this

"Top" relation is having 5 columns but your hbase table have 3 columns, As first one would be the row key, so there should be only 4 columns in "Top" .regenerate top relation and output should be ID, weight,user_id, count (zzxb0Y , 34.2, dVK7EAJd, 5 )

then use :-

STORE TOP INTO 'hbase://sample_data' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(sample_col:weight, sample_col:user_id, sample_col:count);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM