简体   繁体   中英

Getting error while running map reduce jobs in R

I just started integrating RHadoop. It is integrated R-studio server with Hadoop, but I am getting error while running map-reduce jobs. when I run following Line of code.

library(rmr2)
a <- to.dfs(seq(from=1, to=500, by=3), output="/user/hduser/num")
*b <- mapreduce(input=a, map=function(k,v){keyval(v,v*v)})*

StackTrace:

15/03/24 21:13:47 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.5.0-cdh5.2.0.jar] /tmp/streamjob4788227373090541042.jar tmpDir=null
15/03/24 21:13:48 INFO client.RMProxy: Connecting to ResourceManager at tungsten10/192.168.0.123:8032
15/03/24 21:13:48 INFO client.RMProxy: Connecting to ResourceManager at tungsten10/192.168.0.123:8032
15/03/24 21:13:49 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/24 21:13:50 INFO mapreduce.JobSubmitter: number of splits:2
15/03/24 21:13:50 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427104115974_0009
15/03/24 21:13:50 INFO impl.YarnClientImpl: Submitted application application_1427104115974_0009
15/03/24 21:13:50 INFO mapreduce.Job: The url to track the job: http://XXX.XXX.XXX.XXX:8088/proxy/application_1427104115974_0009/
15/03/24 21:13:50 INFO mapreduce.Job: Running job: job_1427104115974_0009
15/03/24 21:14:02 INFO mapreduce.Job: Job job_1427104115974_0009 running in uber mode : false
15/03/24 21:14:03 INFO mapreduce.Job:  map 0% reduce 0%
15/03/24 21:14:07 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/03/24 21:14:08 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/03/24 21:14:15 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000001_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/03/24 21:14:16 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/03/24 21:14:20 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000001_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/03/24 21:14:21 INFO mapreduce.Job: Task Id : attempt_1427104115974_0009_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

15/03/24 21:14:25 INFO mapreduce.Job:  map 100% reduce 0%
15/03/24 21:14:26 INFO mapreduce.Job: Job job_1427104115974_0009 failed with state FAILED due to: Task failed task_1427104115974_0009_m_000001
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/03/24 21:14:26 INFO mapreduce.Job: Counters: 13
    Job Counters 
        Failed map tasks=7
        Killed map tasks=1
        Launched map tasks=8
        Other local map tasks=6
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=27095
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=27095
        Total vcore-seconds taken by all map tasks=27095
        Total megabyte-seconds taken by all map tasks=27745280
    Map-Reduce Framework
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
15/03/24 21:14:26 ERROR streaming.StreamJob: Job not Successful!
Streaming Command Failed!
**Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1
15/03/24 21:14:30 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://XXX.XXX.XXX.XXX:8020/tmp/file10076f272b9a' to trash at: hdfs://XXX.XXX.XXX.XXX:8020/user/hduser/.Trash/Current**

I searched a lot for solving this problem, but solution not found yet. As I am new to RHadoop I am stucked with this problem. Can, Anyone please help me to resolve this problem, I will be very much thankful.

The error is caused as the HADOOP_STREAMING environment variable is not set in your code. You should specify the full path along with the jar file name. The below R code seems to work fine for me.

R Code (I'm using hadoop 2.4.0 over Ubuntu)

Sys.setenv("HADOOP_CMD"="/usr/local/hadoop/bin/hadoop")
Sys.setenv("HADOOP_STREAMING"="/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.4.0.jar")

library(rJava)
library(rhdfs)
# Initialise
hdfs.init()
library(rmr2)

a <- to.dfs(seq(from=1, to=500, by=3), output="/user/hduser/num")
b <- mapreduce(input=a, map=function(k,v){keyval(v,v*v)})

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM