通過蜂巢錯誤定制地圖減少

Question

我試圖通過配置單元運行自定義的map-reduce。 我為wordcount創建了示例映射器和reducer類。 我按照本文的以下步驟操作http://www.lichun.cc/blog/2012/06/wordcount-mapreduce-example-using-hive-on-local-and-emr/

create external table if not exists raw_lines(line string)
    ROW FORMAT DELIMITED
    stored as textfile
    location '/user/new_user/hive_mr_input';

我已經將字數示例行添加到/ user / new_user / hive_mr_input目錄中。

create external table if not exists word_count(word string, count int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
lines terminated by '\n' 
STORED AS TEXTFILE LOCATION '/user/new_user/hive_mr_output';

hive>
add file /home/new_user/hive/WordCountReducer.java;
add file /home/new_user/hive/WordCountMapper.java;

    from (
            from raw_lines
            map raw_lines.line        
            using '/user/new_user/hive/WordCountMapper.java'
            as word, count
            cluster by word) map_output
    insert overwrite table word_count
    reduce map_output.word, map_output.count
    using '/user/new_user/hive/WordCountReducer.java'
    as word,count;

當我執行以上命令時，出現錯誤：

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.

我以為可能是因為我在表創建中使用了“ \\ t”定界符，所以我在Mapper類中進行了一些更改並嘗試使用帶逗號的文件

String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line,",");

並在word_count表創建-> FIELDS TERMINATED BY'，'中將表結構更改為使用“，”
但是我遇到了同樣的錯誤。

上面的代碼有什么問題？

Answer 1

它現在起作用的原因是您嘗試使用Java。 在您所指向的示例中，作者使用的是python。 請參閱此處的文檔。

您提供的作為自定義轉換的腳本必須是可執行的，並且必須能夠接受來自標准輸入的輸入和來自標准輸出的輸出數據。 因此，您實際上可以使用任何語言，甚至包括bash。 其他受歡迎的選擇是python（例如您鏈接的文章中的python）或ruby等。

無論選擇哪種語言，都必須確保在所有節點以及所有需要的庫中都可以使用解釋器，否則腳本將失敗。

您正在提供Java源代碼，但無法正常工作。 Hive不會為您編譯代碼。

您可以使用Java，但必須構建一個可執行jar。 請參閱另一篇有關如何執行此操作的文章。

通過蜂巢錯誤定制地圖減少

問題描述

1 個解決方案

解決方案1
0 2016-02-23 03:14:34

通過蜂巢錯誤定制地圖減少

問題描述

1 個解決方案

解決方案1 0 2016-02-23 03:14:34

解決方案1
0 2016-02-23 03:14:34