简体   繁体   English

如何从一个HBase实例读取但如何写入另一个实例?

[英]How can I read from one HBase instance but write to another?

Currently I have two Hbase tables (lets call them tableA and tableB ). 目前,我有两个Hbase表(让它们称为tableAtableB )。 Using a single stage MapReduce job the data in tableA is read processed and saved to tableB . 使用单阶段MapReduce作业,对tableA的数据进行读取处理并将其保存到tableB Currently both tables reside on the same HBase cluster. 当前,两个表都位于同一HBase群集上。 However, I need to relocate tableB to its on cluster. 但是,我需要将tableB重定位到其在群集上。

Is it possible to configure a single stage map reduce job in Hadoop to read and write from separate instances of HBase? 是否可以在Hadoop中配置单阶段映射减少作业以从单独的HBase实例读取和写入?

It is possible, HBase's CopyTable MapReduce job does it by using TableMapReduceUtil.initTableReducerJob() which allows you to set an alternative quorumAddress in case you need to write to remote clusters: 可能的是,HBase的CopyTable MapReduce作业可以通过使用TableMapReduceUtil.initTableReducerJob()此操作,该功能允许您设置备用quorumAddress,以防需要写入远程集群:

public static void initTableReducerJob(String table, Class<? extends TableReducer> reducer, org.apache.hadoop.mapreduce.Job job, Class partitioner, String quorumAddress, String serverClass, String serverImpl)

quorumAddress - Distant cluster to write to; quorumAddress-要写入的远程群集; default is null for output to the cluster that is designated in hbase-site.xml. 对于输出到hbase-site.xml中指定的集群的默认值为null。 Set this String to the zookeeper ensemble of an alternate remote cluster when you would have the reduce write a cluster that is other than the default; 当您要让reduce编写非默认集群时,将此字符串设置为备用远程集群的zookeeper集合。 eg copying tables between clusters, the source would be designated by hbase-site.xml and this param would have the ensemble address of the remote cluster. 例如,在群集之间复制表,则源将由hbase-site.xml指定,并且该参数将具有远程群集的集合地址。 The format to pass is particular. 要传递的格式特别。 Pass :: such as server,server2,server3:2181:/hbase. 传递::,例如server,server2,server3:2181:/ hbase。


Another option is to implement your own custom reducer to write to the remote table instead of writing to the context. 另一个选择是实现您自己的自定义化简器以写入远程表,而不是写入上下文。 Something similar to this: 类似于以下内容:

public static class MyReducer extends Reducer<Text, Result, Text, Text> {

    protected Table remoteTable; 
    protected Connection connection;

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        super.setup(context);
        // Clone configuration and provide a new quorum address for the remote cluster
        Configuration config = HBaseConfiguration.create(context.getConfiguration());
        config.set("hbase.zookeeper.quorum","quorum1,quorum2,quorum3");
        connection = ConnectionFactory.createConnection(config); // HBase 0.99+
        //connection = HConnectionManager.createConnection(config); // HBase <0.99
        remoteTable = connection.getTable("myTable".getBytes());
        remoteTable.setAutoFlush(false);
        remoteTable.setWriteBufferSize(1024L*1024L*10L); // 10MB buffer
    }

    public void reduce(Text boardKey, Iterable<Result> results, Context context) throws IOException, InterruptedException {
        /* Write puts to remoteTable */
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        super.cleanup(context);
        if (remoteTable!=null) {
            remoteTable.flushCommits();
            remoteTable.close();
        }
        if(connection!=null) {
            connection.close();
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM