简体   繁体   中英

How to create user specific file with unique name in the reducer phase of Hadoop Map Reduce Framework(In Python))

I have written one code for the reducer which will read the output from the mapper. And then It will create a new file with the key name And all the values corresponding to same key will be stored into one file.

My code is:

!/usr/bin/env python

import sys

last_key      = None              #initialize these variables

for input_line in sys.stdin:

    input_line = input_line.strip()
    data = input_line.split("\t") 
    this_key = data[0]
    if len(data) == 2:
        value = data[1]
    else:
        value = None
    if last_key == this_key:
        if value:
            fp.write('{0}\n'.format(value))
    else:
        if last_key:
            fp.close()
            fp = open('%s.txt' %this_key,'a')
            if value:
                fp.write('{0}\n'.format(value))
        if not last_key:
            fp = open('%s.txt' %this_key,'a')
            if value:
                fp.write('{0}\n'.format(value))
        last_key = this_key     

But It is not creating any file.

So, My question is what function should I need to use to create new files into HDFS.

There are no straightforward solution to achieve this.You may follow below approaches to achieve this using Mapreduce:

Approach 1: Using partitioner

  1. Find out unique number of files.eg count unique number of '%this_key%' in file.
  2. Set number of reducer to previous step result in mapreduce driver [each file per reducer].
  3. Use partitioner to send the map-output to particular reducer.
  4. Reducer emit only %value%.
  5. At the end of job you will have same key value per file and you might rename reducer output files.

Approach 2: if number of files are very less then use MultipleOutputs .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM