简体   繁体   English

通过蜂巢错误定制地图减少

[英]custom map-reduce through hive error

I was trying to run custom map-reduce through hive. 我试图通过配置单元运行自定义的map-reduce。 I created sample mapper and reducer classes for wordcount. 我为wordcount创建了示例映射器和reducer类。 I followed below steps from this article http://www.lichun.cc/blog/2012/06/wordcount-mapreduce-example-using-hive-on-local-and-emr/ 我按照本文的以下步骤操作http://www.lichun.cc/blog/2012/06/wordcount-mapreduce-example-using-hive-on-local-and-emr/

create external table if not exists raw_lines(line string)
    ROW FORMAT DELIMITED
    stored as textfile
    location '/user/new_user/hive_mr_input';

I have added sample lines for wordcount to /user/new_user/hive_mr_input dir. 我已经将字数示例行添加到/ user / new_user / hive_mr_input目录中。

create external table if not exists word_count(word string, count int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
lines terminated by '\n' 
STORED AS TEXTFILE LOCATION '/user/new_user/hive_mr_output';

hive>
add file /home/new_user/hive/WordCountReducer.java;
add file /home/new_user/hive/WordCountMapper.java;

    from (
            from raw_lines
            map raw_lines.line        
            using '/user/new_user/hive/WordCountMapper.java'
            as word, count
            cluster by word) map_output
    insert overwrite table word_count
    reduce map_output.word, map_output.count
    using '/user/new_user/hive/WordCountReducer.java'
    as word,count;

when I executed above command I got an error: 当我执行以上命令时,出现错误:

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.

I thought It maybe because of "\\t" delimiter i used in table creation so I made some changes in Mapper class and tried using file with commas 我以为可能是因为我在表创建中使用了“ \\ t”定界符,所以我在Mapper类中进行了一些更改并尝试使用带逗号的文件

String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line,",");

and changed table structure to use "," in word_count table creation -> FIELDS TERMINATED BY ',' 并在word_count表创建-> FIELDS TERMINATED BY','中将表结构更改为使用“,”
however I got the same error. 但是我遇到了同样的错误。

What's wrong with above code? 上面的代码有什么问题?

The reason it's now working is that your trying to use Java. 它现在起作用的原因是您尝试使用Java。 In the example you point to, the author is using python. 在您所指向的示例中,作者使用的是python。 See the documentation here . 请参阅此处的文档。

The script you supply as a custom transformation must be executable and must be able to accept input from standard input and output data from standard output. 您提供的作为自定义转换的脚本必须是可执行的,并且必须能够接受来自标准输入的输入和来自标准输出的输出数据。 For this reason, you can actually use any language, even bash. 因此,您实际上可以使用任何语言,甚至包括bash。 Other popular choices are python, like in the article you linked, or ruby etc. 其他受欢迎的选择是python(例如您链接的文章中的python)或ruby等。

Whatever language you choose, you have to make sure that the interpreter is available in all the nodes, as well as all the needed libraries, otherwise the script will fail. 无论选择哪种语言,都必须确保在所有节点以及所有需要的库中都可以使用解释器,否则脚本将失败。

You're supplying java source code, and that won't work. 您正在提供Java源代码,但无法正常工作。 Hive won't compile your code for you. Hive不会为您编译代码。

You can use java, but you have to build an executable jar. 您可以使用Java,但必须构建一个可执行jar。 See this other post on how to do that . 请参阅另一篇有关如何执行此操作的文章

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM