简体   繁体   中英

Hadoop Streaming task failure

I have a relatively simple program written in C++ and I have been using Hadoop Streaming for MapReduce jobs (my version of Hadoop is Cloudera).

Recently, I found that a lot of streaming tasks are keep failing and restarted by task tracker while they finish successfully at the end. I tracked the user logs and it seems some MapReduce tasks are getting zero input. Specifically the error message looks like this:

HOST=null
USER=mapred
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |TCGA-06-0216-0000024576-0000008192   0   27743   10716|
Date: Sun Apr 29 15:55:51 EDT 2012
java.io.IOException: Broken pipe  

Sometimes the error rate is pretty high (nearly 50%). I don't think it's normal. Does anyone know

a) What's going on ?

b) How can I fix it ?

Thanks

Does your data have a lot of characters in other languages (eg Chinese)?

If so, check your character encoding setting in (1) JVM for your Hadoop cluster : it is likely to be set at UTF-8 by default. (2) your mapper / reducer : make sure your mapper / reducer emits characters in UTF-8 (or whichever char encoding you have set your JVM)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM