简体   繁体   English

将高容量减速器输出写入HBase

[英]Writing high volume reducer output to HBase

I have an Hadoop MapReduce job whose output is a row-id with a Put/Delete operation for that row-id. 我有一个Hadoop MapReduce作业,其输出是一个row-id,对该行id具有Put / Delete操作。 Due to the nature of the problem, the output is rather high volume. 由于问题的性质,输出量相当高。 We have tried several method to get this data back into HBase and they have all failed... 我们已经尝试了几种方法将这些数据恢复到HBase并且它们都失败了......

Table Reducer 表减速机

This is way to slow since it seems that it must do a full round trip for every row. 这是缓慢的方式,因为它似乎必须为每一行进行完整的往返。 Due to how the keys sort for our reducer step, the row-id is not likely to be on the same node as the reducer. 由于键对我们的reducer步骤进行排序,因此row-id不可能与reducer在同一节点上。

completebulkload completebulkload

This seems to take a long time (never completes) and there is no real indication of why. 这似乎需要很长时间(永远不会完成),并且没有真正的迹象表明原因。 Both IO and CPU show very low usage. IO和CPU都显示出非常低的使用率。

Am I missing something obvious? 我错过了一些明显的东西吗

I saw from your answer to self that you solved your problem but for completeness I'd mention that there's another option - writing directly to hbase. 我从你的回答中看到你解决了你的问题但是为了完整性我会提到另一种选择 - 直接写入hbase。 We have a set up where we stream data into HBase and with proper key and region splitting we get to more than 15,000 1K records per second per node 我们有一个设置,我们将数据流式传输到HBase,并通过适当的密钥和区域分割,我们得到每个节点每秒超过15,000个1K记录

CompleteBulkLoad was the right answer. CompleteBulkLoad是正确的答案。 Per @DonaldMiner I dug deeper and found out that the CompleteBulkLoad process was running as "hbase" which resulted in a permission denied error when trying to move/rename/delete the source files. 每个@DonaldMiner我深入挖掘并发现CompleteBulkLoad进程作为“hbase”运行,在尝试移动/重命名/删除源文件时导致权限被拒绝错误。 The implementation appears to retry for a long time before giving an error message; 在给出错误消息之前,该实现似乎重试了长时间; up to 30 minutes in our case. 在我们的案例中长达30分钟。

Giving the hbase user write access to the files resolved the issue. 为hbase用户提供对文件的写访问权解决了该问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM