[英]how to control the number of mappers per region server for reading a HBase table
I have a HBase Table(Written through Apache Phoenix) , That needs to be read and write to a Flat Text File.我有一个 HBase 表(通过 Apache Phoenix 编写),需要读取和写入纯文本文件。 Current Bottleneck is as we have 32 salt buckets for that HBase(Phoenix) table it opens only 32 mappers to read.当前的瓶颈是因为我们有 32 个盐桶用于该 HBase(Phoenix) 表,它只打开 32 个映射器来读取。 And when the data grows over 100 Billion it becomes time consuming.当数据增长超过 1000 亿时,它变得非常耗时。 Can someone point me how to control the number of mappers per region server for reading a HBase table?有人可以指出我如何控制每个区域服务器用于读取 HBase 表的映射器数量吗? I also have seen program that explains in below URL , " https://gist.github.com/bbeaudreault/9788499 " but I does not have a driver program that explains fully.我也看过在下面的 URL 中解释的程序,“ https://gist.github.com/bbeaudreault/9788499 ”,但我没有完整解释的驱动程序。 Can someone help?有人可以帮忙吗?
In my observation, number of regions of table = number of mappers opened by framework .在我看来,表的区域数 = framework 打开的映射器数。
so reduce number of regions which will in turn reduce number of mappers.所以减少区域的数量,这反过来又会减少映射器的数量。
1) pre-split hbase table while creating for ex 0-9 . 1) 在为 ex 0-9 创建时预先拆分 hbase 表。
2) load all the data with in these regions by generating row prefix between 0-9.* 2) 通过生成 0-9.* 之间的行前缀来加载这些区域中的所有数据
Also, have a look at apache-hbase-region-splitting-and-merging另外, 看看 apache-hbase-region-splitting-and-merging
Moreover, setting number of mappers does not guarantee that it will open those many, it was driven by input splits此外,设置映射器的数量并不能保证它会打开那么多,它是由输入拆分驱动的
You can change number of mappers using setNumMapTasks
or conf.set('mapred.map.tasks','numberofmappersyouwanttoset')
(but its a suggestion to configuration ).您可以使用setNumMapTasks
或conf.set('mapred.map.tasks','numberofmappersyouwanttoset')
更改映射器的数量(但这是对配置的建议)。
About link provided by you, I don't know what is this how it works you can check with author.关于您提供的链接,我不知道它是如何工作的,您可以与作者联系。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.