简体   繁体   中英

Hbase mapreduce interaction

I have an program hbase and mapreduce.

I store data in HDFS, size of this file is : 100G. Now i put this data to Hbase.

I use mapreduce to scan this file lost 5 minutes. But to scan hbase table lost 30 minutes.

How to increase the speed when using hbase and mapreduce ?

Thanks.

I am assuming you are having a Single Node HDFS. If you had your 100Gb file in a Multi Node cluster of HDFS, it would have been much faster for both Map Reduce and Hive.

You could try increasing no of mappers and reducers on Map Reduce to gain some performance increase, have a look at this post .

Hive is essentially a Data Warehousing tool built on top of HDFS and every query is underneath is a Map Reduce task itself. So above post would answer this problem also.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM