简体   繁体   中英

How to use Hbase's RowCounter class to get number of rows in a table?

When using

$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>

how do I specify a MapReduce cluster to use to count rows in my specified table (per this link from the hbase.apache website).

I ran the above command on my cmd line and it returned the number of rows back. However, it took over 2 hrs to return the count because it was running it on localhost, and not on a hadoop cluster. It took me 10 min to run from the hbase shell using:

count 'tablename'

Before someone asks why can't I just run this cmd from the hbase shell, I have a table that took 1 hr to return the rowcount. I thought it would be faster using this command since it would use a mapreduce job to return a row count as opposed to running in hbase shell, which I don't think uses mapreduce.

I won't admit to user error but apparently the user I used to run the command didn't have access to the Hadoop cluster so no map/reduce job was created on the cluster and thus a local MR job was created by the command. It finished, but took 2 hrs to complete.

When I found a user that did have permission, the job completed in 30 sec and DID use the MR cluster to divide and conquer the job.

Posting this answer in case someone runs into the same problem I did, but hopefully will save them time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM