How to use Hbase's RowCounter class to get number of rows in a table?

Question

When using

$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>

how do I specify a MapReduce cluster to use to count rows in my specified table (per this link from the hbase.apache website).

I ran the above command on my cmd line and it returned the number of rows back. However, it took over 2 hrs to return the count because it was running it on localhost, and not on a hadoop cluster. It took me 10 min to run from the hbase shell using:

count 'tablename'

Before someone asks why can't I just run this cmd from the hbase shell, I have a table that took 1 hr to return the rowcount. I thought it would be faster using this command since it would use a mapreduce job to return a row count as opposed to running in hbase shell, which I don't think uses mapreduce.

Answer 1

I won't admit to user error but apparently the user I used to run the command didn't have access to the Hadoop cluster so no map/reduce job was created on the cluster and thus a local MR job was created by the command. It finished, but took 2 hrs to complete.

When I found a user that did have permission, the job completed in 30 sec and DID use the MR cluster to divide and conquer the job.

Posting this answer in case someone runs into the same problem I did, but hopefully will save them time.

How to use Hbase's RowCounter class to get number of rows in a table?

Question

1 answers

solution1
0 ACCPTED 2016-07-25 22:53:35

How to use Hbase's RowCounter class to get number of rows in a table?

Question

1 answers

solution1 0 ACCPTED 2016-07-25 22:53:35

solution1
0 ACCPTED 2016-07-25 22:53:35