簡體 English 中英

Hbase掃描與Mapreduce進行即時計算

[英]Hbase scan vs Mapreduce for on the fly computation

原文 2014-11-21 15:08:11 2 1 java/ performance/ hadoop/ mapreduce/ hbase

我需要計算HBase表上的聚合。

說我有這個hbase表：'元數據'列族：M列：n

這里元數據對象有一個字符串列表

類元數據{列表標簽；

}

我需要計算我正在考慮使用mapreduce或直接掃描hbase的標簽的數量。

結果必須即時返回。 那么在這種情況下我可以使用哪一個呢？ 掃描hbase並計算聚合或mapreduce？

Mapreduce最終將掃描hbase並計算計數。

使用這兩種方法的利弊是什么？

1 個解決方案

我懷疑您不了解HBase的優缺點，它不適用於計算大型數據集的實時聚合。

首先讓我們說MapReduce本身就是一個計划的作業，您將無法即時返回響應，任務跟蹤器至少需要15秒來初始化作業。

最后，MapReduce作業將做完全相同的事情：HBase掃描，立即執行掃描與MapReduce之間的區別只是並行處理和數據局部性，當您擁有數百萬/數十億行時，這是出色的。如果您的查詢只需要讀取幾千個連續的行來匯總它們，那么可以進行掃描，並且它可能具有可接受的響應時間，但是對於較大的數據集，將不可能在查詢時進行掃描。

HBase最適合處理大量的原子讀取和寫入，這樣，無論您需要多少預聚集計數器或將要接收多少請求，您都可以實時維護這些聚集。適當的行鍵設計和拆分策略，您可以擴展以滿足需求。

可以將其視為一個單詞計數，您可以將所有單詞存儲在列表中，並在查詢時對它們進行計數，也可以在插入時處理該列表，並存儲每個單詞在文檔中的使用次數，作為全球計數器，並在每天，每月，每年，每個國家/地區，每個作者的表（甚至家庭）中。

mapreduce，hbase和掃描

[英]mapreduce, hbase and scan

在使用MapReduce進行HBase掃描期間，Reducer的數量始終為1

[英]during HBase scan with MapReduce, the number of Reducer is always one

組合器在HBase掃描mapreduce中為每個區域創建mapoutput文件

[英]Combiner creating mapoutput file per region in HBase scan mapreduce

HBase MapReduce

[英]HBase MapReduce

如何將 HBase 的掃描限制為 MapReduce 作業的僅相關（未過濾）區域

[英]How can I limit the scan of HBase to only relevant (Unfiltered) regions for the MapReduce job

HBase mapreduce：在Reducer中寫入HBase

[英]HBase mapreduce: write into HBase in Reducer

HBase MapReduce中的Nullpointer異常

[英]Nullpointer exception in HBase MapReduce

Hadoop HBase MapReduce組合器

[英]hadoop hbase mapreduce combiner

MapReduce HBase NullPointerException

[英]MapReduce HBase NullPointerException

使用hadoop mapreduce進行矩陣計算

[英]matrix computation using hadoop mapreduce

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 mapreduce，hbase和掃描在使用MapReduce進行HBase掃描期間，Reducer的數量始終為1 組合器在HBase掃描mapreduce中為每個區域創建mapoutput文件 HBase MapReduce 如何將 HBase 的掃描限制為 MapReduce 作業的僅相關（未過濾）區域 HBase mapreduce：在Reducer中寫入HBase HBase MapReduce中的Nullpointer異常 Hadoop HBase MapReduce組合器 MapReduce HBase NullPointerException 使用hadoop mapreduce進行矩陣計算

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM