简体   繁体   English

如何使用Pig和HBaseStorage存储到HBase中

[英]How to Store into HBase using Pig and HBaseStorage

In the HBase shell, I created my table via: 在HBase Shell中,我通过以下方式创建了表:

create 'pig_table','cf'

In Pig, here are the results of the alias I wish to store into pig_table : 在Pig中,这是我希望存储到pig_table中的别名的结果:

DUMP B;

Produces tuples with 6 fields: 产生具有6个字段的元组:

(D1|30|2014-01-01 13:00,D1,30,7.0,2014-01-01 13:00,DEF)
(D1|30|2014-01-01 22:00,D1,30,1.0,2014-01-01 22:00,JKL)
(D10|20|2014-01-01 11:00,D10,20,4.0,2014-01-01 11:00,PQR)
...

The first field is a concatenation of the 2nd, third, and 5th fields, and will be used as the HBase rowkey. 第一个字段是第二,第三和第五个字段的串联,将用作HBase行键。

But

STORE B INTO 'hbase://pig_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ( 'cf:device_id,cf:cost,cf:hours,cf:start_time,cf:code')

results in: 结果是:

`Failed to produce result in "hbase:pig_table"

The logs are giving me: 日志给了我:

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataByteArray
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.objToBytes(HBaseStorage.java:924)
at org.apache.pig.backend.hadoop.hbase.HBaseStorage.putNext(HBaseStorage.java:875)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:551)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:85)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:99)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Reduce.runPipeline(PigGenericMapReduce.java:468)
... 11 more

What is wrong with my syntax? 我的语法有什么问题?

It appears that HBaseStorage does not automatically convert the data fields of the tuples into chararray, and which is necessary before it can be stored in HBase. 似乎HBaseStorage不会自动将元组的数据字段转换为chararray,这是必须的,然后才能将其存储在HBase中。 I simply casted them as such: 我只是这样简单地投射它们:

C = FOREACH B {
    GENERATE
    (chararray)$0
    ,(chararray)$1
    ,(chararray)$2
    ,(chararray)$3
    ,(chararray)$4
    ,(chararray)$5
    ,(chararray)$6
    ;
}

STORE B INTO 'hbase://pig_table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage ( 'cf:device_id,cf:cost,cf:hours,cf:start_time,cf:code')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM