简体   繁体   中英

HBase table as MapReduce input?

I wonder to know what are the pros and cons of having an HBase table as a mapreduce job input? how it affects the performance?

Pros : 1. point lookup is possible eliminating the need to read whole data.

  1. Reduce phase can be completely avoided if hbase is integrated as input source , as complete data for a given key can be fetched .

Cons : 1. if hbaseBlock size is not tuned properly scanning a very small set may lead to scanning the complete underlying data (1% read in worst case may lead to reading 100% data )

  1. In case of full scan , directly reading from hdfs is the most "preferred" choice .
  2. Hbase may lead to abuse of dfs if "datalocality is not maintained due to movement of regions across region servers"

Overall it all depends how has one tuned hbase depending on his read/write patterns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM