簡體 English 中英

用一百萬次掃描運行hbase mapreduce作業有意義嗎？

[英]Does it make sense to run an hbase mapreduce job with a million Scans?

原文 2014-02-11 21:56:29 0 1 hadoop/ mapreduce/ hbase

我在hbase中有一個數據集，該數據集足夠大，要花幾個小時才能對整個數據集運行mapreduce作業。 我希望能夠使用預先計算的索引來分解數據：每天一次映射整個數據集並將其分解為多個索引：

所有用戶樣本的1％
參與特定A / B實驗的所有用戶
每晚預發布頻道上的所有用戶。
所有具有特定附加組件的用戶（或本周我們感興趣的任何條件）

我的想法是只存儲相關記錄的行ID列表，然后以后人們只能在這些行上執行很少的mapreduce工作。 但是1％的樣本仍然是100萬行數據，我不確定如何在一百萬行的列表上構造mapreduce作業。

如果要由一百萬個不同的Scan對象組成查詢，那么使用initTableMapperJob（List scans）創建表映射器作業是否有意義？ 還有其他方法可以使我仍然可以將計算和I / O有效地分配給hbase集群嗎？

1 個解決方案

不要進行一百萬次掃描。 如果您有一百萬個不連續的ID，則可以使用自定義輸入格式在ID列表上運行map / reduce作業，以便將列表划分為合理數量的分區（我猜這是您的數量的4倍） m / r插槽，但該數字不基於任何值）。 這將為您提供一百萬次獲取操作，這可能比一百萬次掃描要好。

如果您有幸擁有更合理數量的連續范圍，那么掃描將比直接獲取更好

HBase mapreduce作業-多次掃描-如何設置每次掃描的表

[英]HBase mapreduce job - Multiple scans - How to set the table of each Scan

HBase多表掃描作業

[英]HBase multiple table scans for the job

即使沒有任何意義，如何將 Hadoop mapreduce 作業實現為非 map/reduce？

[英]How to implement Hadoop mapreduce job as non map/reduce even if does not make any sense?

HBase mapreduce作業如何與服務器通信？（新手問題）

[英]How does HBase mapreduce job communicate with server? (newbie question)

在表上運行MapReduce時，HBase MapReduce如何讀取版本？

[英]How may versions does HBase MapReduce reads when MapReduce is run on a table ?

在哪里運行 MapReduce 作業

[英]Where to run MapReduce Job

HBase MapReduce作業：所有列值均為空

[英]Hbase mapreduce job: all column values are null

使用MapReduce作業刪除HBase批量刪除

[英]HBase bulk delete using MapReduce job

運行HBase MapReduce作業時出現NullPoinerEcxeption

[英]NullPoinerEcxeption while running HBase MapReduce Job

讀取hbase表時掛起Mapreduce作業

[英]Hanging Mapreduce job while reading hbase tables

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 HBase mapreduce作業-多次掃描-如何設置每次掃描的表 HBase多表掃描作業即使沒有任何意義，如何將 Hadoop mapreduce 作業實現為非 map/reduce？ HBase mapreduce作業如何與服務器通信？（新手問題）在表上運行MapReduce時，HBase MapReduce如何讀取版本？在哪里運行 MapReduce 作業 HBase MapReduce作業：所有列值均為空使用MapReduce作業刪除HBase批量刪除運行HBase MapReduce作業時出現NullPoinerEcxeption 讀取hbase表時掛起Mapreduce作業

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM