繁体 English 中英

HBase MapReduce拆分扫描不同的映射器

[英]HBase MapReduce split scan for different mappers

原文 2013-04-18 12:56:15 4 2 hadoop/ mapreduce/ hbase/ mapper

我正在努力以适当的方式分布我的HBase行，以完成多个地图任务。 我的目的是通过行键拆分我的扫描，并将每个行分配给地图作业。

到目前为止，我只能定义一个扫描，其中我的映射器每次总是获得一行。 但这不是我想要的-我需要设置map-input。

因此，有可能拆分我的HBase表响应。 扫描到n行，然后输入n个映射器？

我不是在寻找一种解决方案来启动MapReduce作业来编写n个文件，而另一个MapReduce作业来重新读取它们作为获取这些集合的文本输入。

提前致谢！

2 个解决方案

映射器总是一次只能获得一行-如果要与地图侧的多行相关联，这就是map-reduce的工作方式，您可以自己执行操作（例如，使用一些静态变量等），也可以将逻辑写为组合器，它是地图端的“减少”步骤。

请注意，您仍然需要一个reducer来处理相关键由不同的映射器处理的极端情况-因为在磁盘上对hbase键进行排序时，您只会在拆分的末尾/开始得到它。 您可以通过预先拆分来降低发生这种情况的风险

仔细研究实现，我看到一次扫描就调用映射步骤，结果恰好使用了一个映射器。 这就是为什么完全不拆分输入集的原因。

使用扫描列表，并将其提供给TableMapReduceUtil.initTableReducerJob函数，在每次扫描时对输入集进行拆分。 因此，可以定义MapReduce作业中使用的映射器的数量。

另一种方法是扩展TableInputFormat类并重写split方法。

正如Arnon Rotem-Gal-Oz所说的那样，在映射器的地图功能中，一次只能访问一行。

HBase链MapReduce作业，向所有Mappers广播较小的表

[英]HBase chain MapReduce job with broadcasting smaller tables to all Mappers

Hbase扫描与Mapreduce进行即时计算

[英]Hbase scan vs Mapreduce for on the fly computation

在Hadoop MapReduce中是否可以使用多个不同的映射器进行多个输入？

[英]Is it possible to have multiple inputs with multiple different mappers in Hadoop MapReduce?

在使用MapReduce进行HBase扫描期间，Reducer的数量始终为1

[英]during HBase scan with MapReduce, the number of Reducer is always one

当使用HBase作为MapReduce的源时，我可以扩展TableInputFormatBase来为每个区域创建多个拆分和多个映射器吗？

[英]When using HBase as a source for MapReduce, can I extend TableInputFormatBase to create multiple splits and multiple mappers for each region?

HBase mapreduce作业-多次扫描-如何设置每次扫描的表

[英]HBase mapreduce job - Multiple scans - How to set the table of each Scan

组合器在HBase扫描mapreduce中为每个区域创建mapoutput文件

[英]Combiner creating mapoutput file per region in HBase scan mapreduce

HBase与MapReduce

[英]HBase with MapReduce

Mapreduce：比映射器更多的缩减器？

[英]Mapreduce: more reducers than mappers?

Hadoop MapReduce：默认的映射器数量

[英]Hadoop MapReduce: default number of mappers

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 HBase链MapReduce作业，向所有Mappers广播较小的表 Hbase扫描与Mapreduce进行即时计算在Hadoop MapReduce中是否可以使用多个不同的映射器进行多个输入？在使用MapReduce进行HBase扫描期间，Reducer的数量始终为1 当使用HBase作为MapReduce的源时，我可以扩展TableInputFormatBase来为每个区域创建多个拆分和多个映射器吗？ HBase mapreduce作业-多次扫描-如何设置每次扫描的表组合器在HBase扫描mapreduce中为每个区域创建mapoutput文件 HBase与MapReduce Mapreduce：比映射器更多的缩减器？ Hadoop MapReduce：默认的映射器数量

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM