简体   繁体   English

从HDFS读取并写入HBASE

[英]Read from HDFS and write to HBASE

The Mapper is reading file from two places 1) Articles visited by user(sorting by country) 2) Statistics of country (country wise) 映射器正在从两个位置读取文件1)用户访问过的文章(按国家分类)2)国家的统计信息(按国家/地区)

The output of both Mapper is Text, Text 两个Mapper的输出是Text,Text

I am running program of Amazon Cluster 我正在运行Amazon Cluster的程序

My aim is read data from two different set and combine the result and store it in hbase. 我的目标是从两个不同的集合中读取数据,并将结果合并并存储在hbase中。

HDFS to HDFS is working. HDFS到HDFS正常运行。 The code is getting stuck at reducing 67% and gives error as 代码陷入减少67%并给出错误

17/02/24 10:45:31 INFO mapreduce.Job:  map 0% reduce 0%
17/02/24 10:45:37 INFO mapreduce.Job:  map 100% reduce 0%
17/02/24 10:45:49 INFO mapreduce.Job:  map 100% reduce 67%
17/02/24 10:46:00 INFO mapreduce.Job: Task Id : attempt_1487926412544_0016_r_000000_0, Status : FAILED
Error: java.lang.IllegalArgumentException: Row length is 0
        at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:565)
        at org.apache.hadoop.hbase.client.Put.<init>(Put.java:110)
        at org.apache.hadoop.hbase.client.Put.<init>(Put.java:68)
        at org.apache.hadoop.hbase.client.Put.<init>(Put.java:58)
        at com.happiestminds.hadoop.CounterReducer.reduce(CounterReducer.java:45)
        at com.happiestminds.hadoop.CounterReducer.reduce(CounterReducer.java:1)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:635)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:390)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Driver class is 驱动程序类为

package com.happiestminds.hadoop;



import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class Main extends Configured implements Tool {

    /**
     * @param args
     * @throws Exception
     */
    public static String outputTable = "mapreduceoutput";

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new Main(), args);
        System.exit(exitCode);
    }

    @Override
    public int run(String[] args) throws Exception {


        Configuration config = HBaseConfiguration.create();

        try{
            HBaseAdmin.checkHBaseAvailable(config);
        }
        catch(MasterNotRunningException e){
            System.out.println("Master not running");
            System.exit(1);
        }

        Job job = Job.getInstance(config, "Hbase Test");

        job.setJarByClass(Main.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);



        MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, ArticleMapper.class);
        MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, StatisticsMapper.class);

        TableMapReduceUtil.addDependencyJars(job);
        TableMapReduceUtil.initTableReducerJob(outputTable, CounterReducer.class, job);

        //job.setReducerClass(CounterReducer.class);

        job.setNumReduceTasks(1);


        return job.waitForCompletion(true) ? 0 : 1;
    }

}

Reducer class is 减速器类为

package com.happiestminds.hadoop;

import java.io.IOException;

import org.apache.hadoop.hbase.client.Mutation;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


public class CounterReducer extends TableReducer<Text, Text, ImmutableBytesWritable> {

    public static final byte[] CF = "counter".getBytes();
    public static final byte[] COUNT = "combined".getBytes();


    @Override
    protected void reduce(Text key, Iterable<Text> values,
            Reducer<Text, Text, ImmutableBytesWritable, Mutation>.Context context)
            throws IOException, InterruptedException {

        String vals = values.toString();
        int counter = 0;

        StringBuilder sbr = new StringBuilder();
        System.out.println(key.toString());
        for (Text val : values) {
            String stat = val.toString();
            if (stat.equals("***")) {
                counter++;
            } else {
                sbr.append(stat + ",");
            }

        }
        sbr.append("Article count : " + counter);


        Put put = new Put(Bytes.toBytes(key.toString()));
        put.addColumn(CF, COUNT, Bytes.toBytes(sbr.toString()));
        if (counter != 0) {
            context.write(null, put);
        }

    }



}

Dependencies 依存关系

<dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.3</version>
        </dependency>



        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>1.2.2</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-common</artifactId>
            <version>1.2.2</version>
        </dependency>


        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>1.2.2</version>
        </dependency>



    </dependencies>

A good practice is to validate your values before submitting them somewhere. 一个好的做法是在将值提交到某个地方之前先对其进行验证。 In your particular case you can validate your key and sbr or wrap them into try-catch section with proper notification policy. 在您的特定情况下,你可以验证你的密钥SBR或将其包装成适当的通知政策的try-catch部分。 You should output them into some log if they are not correct and update you unit tests with new test-cases: 如果它们不正确,则应将它们输出到一些日志中,并使用新的测试用例更新单元测试:

 try
 {
    Put put = new Put(Bytes.toBytes(key.toString()));
    put.addColumn(CF, COUNT, Bytes.toBytes(sbr.toString()));
    if (counter != 0) {
        context.write(null, put);
    }
 }
 catch (IllegalArgumentException ex)
 {
      System.err.println("Error processing record - Key: "+ key.toString() +", values: " +sbr.ToString());
 }

According to the exception thrown by the program it is clear that key length is 0 so before putting into hbase you can check if key length is 0 or not then only you can put into the hbase. 根据程序抛出的异常,很明显密钥长度为0,因此在放入hbase之前,您可以检查密钥长度是否为0,然后只能放入hbase。

More clarity why key length's 0 is not supported by hbase 更清楚为什么hbase不支持密钥长度为0

Becuase HBase data model does not allow 0-length row key, it should be at least 1 byte. 因为HBase数据模型不允许长度为0的行键,所以它至少应为1个字节。 0-byte row key is reserved for internal usage (to designate empty start key and end keys). 保留0字节的行键供内部使用(指定空的开始键和结束键)。

Can you try to check whether you are inserting any null values or not ? 您可以尝试检查是否要插入任何空值吗?

HBase data model does not allow zero length row key, it should be at least 1 byte. HBase数据模型不允许长度为零的行键,它至少应为1个字节。

Please check in your reducer code before executing the put command , whether some of the values are populated to null or not. 在执行put命令之前检查您的reducer代码,是否将某些值填充为null。

The error you get is quite self-explanatory. 您得到的错误是不言自明的。 Row keys in HBase can't be empty (though values can be). HBase中的行键不能为空(尽管值可以为空)。

@Override
protected void reduce(Text key, Iterable<Text> values,
        Reducer<Text, Text, ImmutableBytesWritable, Mutation>.Context context)
        throws IOException, InterruptedException {
    if (key == null || key.getLength() == 0) {
      // Log a warning about the empty key.
      return;
    }
    // Rest of your reducer follows.
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM