简体繁体 English

HBase表作为MapReduce输入吗？

[英]HBase table as MapReduce input?

原文 2015-04-23 20:09:38 1 1 hadoop/ mapreduce/ hbase/ nosql

I wonder to know what are the pros and cons of having an HBase table as a mapreduce job input? 我想知道将HBase表用作mapreduce作业输入的利弊吗？ how it affects the performance? 它如何影响性能？

1 个解决方案

Pros : 1. point lookup is possible eliminating the need to read whole data. 优点：1.可以进行点查找，而无需读取整个数据。

Reduce phase can be completely avoided if hbase is integrated as input source , as complete data for a given key can be fetched . 如果将hbase集成为输入源，则可以完全避免使用reduce阶段，因为可以获取给定密钥的完整数据。

Cons : 1. if hbaseBlock size is not tuned properly scanning a very small set may lead to scanning the complete underlying data (1% read in worst case may lead to reading 100% data ) 缺点：1.如果未正确调整hbaseBlock大小，则扫描很小的一组数据可能会导致扫描完整的基础数据（在最坏的情况下，读取1％的数据可能会导致读取100％的数据）

In case of full scan , directly reading from hdfs is the most "preferred" choice . 如果进行全扫描，则直接从hdfs读取是最“首选”的选择。
Hbase may lead to abuse of dfs if "datalocality is not maintained due to movement of regions across region servers" 如果“由于跨区域服务器的区域移动而无法保持数据本地性”，则Hbase可能会导致dfs滥用

Overall it all depends how has one tuned hbase depending on his read/write patterns 总的来说，这取决于如何根据他的读写模式对hbase进行调整