[英]HBase mapreduce job - Multiple scans - How to set the table of each Scan
我使用HBase 1.2。 我想使用多次扫描在HBase上运行MapReduce作业。 在API中,有: TableMapReduceUtil.initTableMapperJob(List<Scan> scans, Class<? extends TableMapper> mapper, Class<?> outputKeyClass, Class<?> outputValueClass, org.apache.hadoop.mapreduce.Job job)
。
但是如何指定每次扫描的表呢? 我使用下面的代码:
List<Scan> scans = new ArrayList<>();
for (String firstPart : firstParts) {
Scan scan = new Scan();
scan.setRowPrefixFilter(Bytes.toBytes(firstPart));
scan.setCaching(500);
scan.setCacheBlocks(false);
scans.add(scan);
}
TableMapReduceUtil.initTableMapperJob(scans, MyMapper.class, Text.class, Text.class, job);
它给出以下异常
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:436)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.initialize(TableInputFormat.java:184)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:241)
at org.apache.hadoop.hbase.mapreduce.TableInputFormat.getSplits(TableInputFormat.java:240)
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:115)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:305)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:322)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:200)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
我认为这是正常的,因为未在任何地方指定应进行每次扫描的表。
但是怎么做呢?
我试图添加
scan.setAttribute("scan.attributes.table.name", Bytes.toBytes("my_table"));
但它给出了相同的错误
List<Scan> scans = new ArrayList<Scan>();
Scan scan1 = new Scan();
scan1.setStartRow(firstRow1);
scan1.setStopRow(lastRow1);
scan1.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, table1);
scans.add(scan1);
Scan scan2 = new Scan();
scan2.setStartRow(firstRow2);
scan2.setStopRow(lastRow2);
scan1.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, table2);
scans.add(scan2);
TableMapReduceUtil.initTableMapperJob(scans, TableMapper.class, Text.class,
IntWritable.class, job);
使用Scan.SCAN_ATTRIBUTES_TABLE_NAME
因为您没有在扫描实例级别设置表,所以得到了这个NPE ...
请按照以下示例操作,在此示例中,您必须在for循环内而不是外部设置表名...然后它应该可以
List<Scan> scans = new ArrayList<Scan>();
for(int i=0; i<3; i++){
Scan scan = new Scan();
scan.addFamily(INPUT_FAMILY);
scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(TABLE_NAME ));
if (start != null) {
scan.setStartRow(Bytes.toBytes(start));
}
if (stop != null) {
scan.setStopRow(Bytes.toBytes(stop));
}
scans.add(scan);
LOG.info("scan before: " + scan);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.