简体   繁体   English

如何使用Hbase的RowCounter类获取表中的行数?

[英]How to use Hbase's RowCounter class to get number of rows in a table?

When using 使用时

$ hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename>

how do I specify a MapReduce cluster to use to count rows in my specified table (per this link from the hbase.apache website). 如何指定一个MapReduce集群用于对指定表中的行进行计数(通过hbase.apache网站的此链接 )。

I ran the above command on my cmd line and it returned the number of rows back. 我在cmd行上运行了上面的命令,它返回了行数。 However, it took over 2 hrs to return the count because it was running it on localhost, and not on a hadoop cluster. 但是,返回计数花费了2个小时以上,因为它是在localhost而不是hadoop集群上运行的。 It took me 10 min to run from the hbase shell using: 我花了10分钟从hbase shell运行以下命令:

count 'tablename'

Before someone asks why can't I just run this cmd from the hbase shell, I have a table that took 1 hr to return the rowcount. 在有人问为什么我不能只从hbase shell运行此cmd之前,我有一个花费1个小时返回表的表。 I thought it would be faster using this command since it would use a mapreduce job to return a row count as opposed to running in hbase shell, which I don't think uses mapreduce. 我认为使用此命令会更快,因为它将使用mapreduce作业来返回行计数,而不是在hbase shell中运行,而我不认为它会使用mapreduce。

I won't admit to user error but apparently the user I used to run the command didn't have access to the Hadoop cluster so no map/reduce job was created on the cluster and thus a local MR job was created by the command. 我不会承认用户错误,但显然我用来运行命令的用户无权访问Hadoop集群,因此在集群上未创建任何map / reduce作业,因此该命令创建了本地MR作业。 It finished, but took 2 hrs to complete. 它完成了,但是花了2个小时才完成。

When I found a user that did have permission, the job completed in 30 sec and DID use the MR cluster to divide and conquer the job. 当我发现有权限的用户时,作业将在30秒内完成并且DID使用MR群集来划分和征服该作业。

Posting this answer in case someone runs into the same problem I did, but hopefully will save them time. 发布此答案,以防有人遇到我遇到的相同问题,但希望可以节省他们的时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM