簡體 English 中英

Hadoop映射器輸出到HBase表和化簡器

[英]Hadoop mapper output to HBase table and a reducer

原文 2014-07-01 21:08:49 8 1 java/ hadoop/ mapreduce/ hbase/ multiple-tables

我正在嘗試編寫一個MapReduce作業，該作業可解析CSV文件，將數據存儲在HBase中並一次性執行reduce函數。 理想情況下，我想

映射器將良好記錄輸出到HBase Table GOOD
映射器將不良記錄輸出到HBase表BAD
映射器使用密鑰將所有好的數據發送到化簡器
還希望更新指示存在新數據的第三張表。 該表將具有有關數據和日期的基本信息。 每個CSV文件最可能有一兩個記錄。

我知道如何使用HBase MultiTableOutputFormat執行1和2，但不確定如何執行3和4。

非常感謝任何有關如何執行此操作的指示。

我對如何執行此操作有一些想法：

對於1和2，我將使用ImmutableBytesWriteable作為鍵，而MultiTableOutputFormat負責從Mapper中進行存儲。 但是對於3我想關鍵是文本。

對於＃4，我應該在Mapper中這樣做嗎

掃描第三個HBase表以進行輸入，如果不存在，則進行填充，否則跳過。 我不喜歡這樣，因為它感覺效率很低。
還是應該在Mapper中維護一個List並以Mappper清理方法寫入HBase？
有更好的方法嗎？

1 個解決方案

映射器通過設置KeyValueTextInputFormat讀取csv。
在映射器代碼中，具有一些邏輯來區分好記錄和壞記錄，並通過使用Put（Hbase Api call）將它們放入Hbase。

在映射器設置中，可以初始化hbaseTable的處理程序。

可以使用context.write（key，value）將好的記錄傳遞給reducer並收集到reducer中

Hadoop映射器和減速器輸出不匹配

[英]Hadoop mapper and reducer output mismatch

Hadoop返回mapper的輸出，而不是reducer

[英]Hadoop returns the output of mapper instead of reducer

如何在Hadoop中將Mapper鍵作為文本輸出到Reducer

[英]How to Output Mapper Key as Text to the Reducer in hadoop

Reducer中的Hadoop MapReduce訪問映射器輸出編號

[英]Hadoop MapReduce access mapper output number in reducer

Hadoop自定義Mapper輸出格式到Reducer

[英]Hadoop custom Mapper output format to Reducer

Mapper和Reducer是Hadoop版本2中的接口嗎？

[英]Mapper and Reducer are interfaces in Hadoop version 2?

Hadoop將變量傳遞給mapper和reducer

[英]Hadoop passing variables to mapper and reducer

Hadoop映射器直接寫入到輸出。（Reducer寫入映射器的輸出）

[英]Hadoop mapper writes directly to ouput. (Reducer writes mapper's output)

每個HBase表的Reducer

[英]a Reducer per HBase table

多個映射器輸入和1個reducer輸出的Hadoop jar命令錯誤（從2個文件中加入2個值）

[英]Hadoop jar command error for multiple mapper inputs and 1 reducer output (Join 2 values from 2 files)

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Hadoop映射器和減速器輸出不匹配 Hadoop返回mapper的輸出，而不是reducer 如何在Hadoop中將Mapper鍵作為文本輸出到Reducer Reducer中的Hadoop MapReduce訪問映射器輸出編號 Hadoop自定義Mapper輸出格式到Reducer Mapper和Reducer是Hadoop版本2中的接口嗎？ Hadoop將變量傳遞給mapper和reducer Hadoop映射器直接寫入到輸出。（Reducer寫入映射器的輸出）每個HBase表的Reducer 多個映射器輸入和1個reducer輸出的Hadoop jar命令錯誤（從2個文件中加入2個值）

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM