How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python))

Question

I have written one code for the reducer which will read the output from the mapper. And then It will create a new file with the key name And all the values corresponding to same key will be stored into one file.

My code is:

!/usr/bin/env python

import sys

last_key      = None              #initialize these variables

for input_line in sys.stdin:

    input_line = input_line.strip()
    data = input_line.split("\t") 
    this_key = data[0]
    if len(data) == 2:
        value = data[1]
    else:
        value = None
    if last_key == this_key:
        if value:
            fp.write('{0}\n'.format(value))
    else:
        if last_key:
            fp.close()
            fp = open('%s.txt' %this_key,'a')
            if value:
                fp.write('{0}\n'.format(value))
        if not last_key:
            fp = open('%s.txt' %this_key,'a')
            if value:
                fp.write('{0}\n'.format(value))
        last_key = this_key

But It is not creating any file.

So, My question is what function should I need to use to create new files into HDFS.

Answer 1

There are no straightforward solution to achieve this.You may follow below approaches to achieve this using Mapreduce:

Approach 1: Using partitioner

Find out unique number of files.eg count unique number of '%this_key%' in file.
Set number of reducer to previous step result in mapreduce driver [each file per reducer].
Use partitioner to send the map-output to particular reducer.
Reducer emit only %value%.
At the end of job you will have same key value per file and you might rename reducer output files.

Approach 2: if number of files are very less then use MultipleOutputs .

How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python))

Question

1 answers

solution1
0 2016-07-21 19:52:22

How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python))

Question

1 answers

solution1 0 2016-07-21 19:52:22

solution1
0 2016-07-21 19:52:22