I'm trying to run a Hadoop Streaming job like so:
yarn jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.*.jar \
-files count.pl \
-input "/my_events/*.bz2" \
-output count-events \
-mapper "cut -f2,4 | grep foo | cut -f1" \
-combiner "perl count.pl -s | perl count.pl" \
-reducer "perl count.pl"
The count.pl
script is a simple script that just counts keys, looping over the input like so (simplified):
while(<>) {
chomp;
my($k,$c) = split /\t/, $_, 2;
$c ||= 1;
$count{$k} += $c;
}
while (my ($k, $c) = each %count) {
print "$k\t$c\n";
}
It fails, and in the Hadoop syslog
output I see crazy crazy things like this - note that it somehow contains the perl script source, and some 1's, and some bzipped data:
2014-03-26 19:04:20,595 WARN [main] org.apache.hadoop.mapred.YarnChild:
Exception running child : java.io.IOException: subprocess exited successfully
R/W/S=8193/81/0 in:4096=8193/2 [rec/s] out:40=81/2 [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 HOST=null
USER=kwilliams
HADOOP_USER=null
last tool output: |} 1 1rint "$k\t$c\n"; 1each %count) { 1ne $lastkey)) { 1��@p@P 0�H�l$�H��L�d$�L�l$�L�t$�H��(
H�GhH�wH��H��H�GhHc�H��H)�H��H���C����L�AH�L�$�J�4&�F��H�L�0E1��~H�EJ�t �F��H�D�hA��H��(H��A��H���
...%�����A��E��tRIc�H��H��L�s������EX ui 0|
Broken pipe
and the stderr
output has:
Can't open |: Broken pipe at count.pl line 12.
It turns out this is a specific problem with using pipes in a Streaming combiner
.
Unlike the mapper
and reducer
, which are allowed to have shell pipes in their commands, combiners cannot. Hadoop Streaming interprets the combiner as the following (pretend $data
is the output of the mapper):
cat $data | perl 'count.pl' '-s' '|' 'perl' 'count.pl'
So the count.pl
script, which uses perl's <>
construct, first parses its command line flags (handling the -s
), then starts reading through $data
, then tries to open & read files called |
, perl
, and count.pl
.
Which is why it gets all that crazy stuff in the syslog output, including some stuff from the count.pl
script itself.
I just thought this was a crazy enough circumstance that I'd better post it somewhere.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.