I have written one code for the reducer which will read the output from the mapper. And then It will create a new file with the key name And all the values corresponding to same key will be stored into one file.
My code is:
!/usr/bin/env python
import sys
last_key = None #initialize these variables
for input_line in sys.stdin:
input_line = input_line.strip()
data = input_line.split("\t")
this_key = data[0]
if len(data) == 2:
value = data[1]
else:
value = None
if last_key == this_key:
if value:
fp.write('{0}\n'.format(value))
else:
if last_key:
fp.close()
fp = open('%s.txt' %this_key,'a')
if value:
fp.write('{0}\n'.format(value))
if not last_key:
fp = open('%s.txt' %this_key,'a')
if value:
fp.write('{0}\n'.format(value))
last_key = this_key
But It is not creating any file.
So, My question is what function should I need to use to create new files into HDFS.
There are no straightforward solution to achieve this.You may follow below approaches to achieve this using Mapreduce:
Approach 1: Using partitioner
Approach 2: if number of files are very less then use MultipleOutputs .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.