I have two key values from map function: NY and Others. so, the output of my key is: NY 1, or Other 1. Only these two cases.
my map function:
#!/usr/bin/env python
import sys
import csv
import string
reader = csv.reader(sys.stdin, delimiter=',')
for entry in reader:
if len(entry) == 22:
registration_state=entry[16]
print('{0}\t{1}'.format(registration_state,int(1)))
Now i need to use reducers to process the map outputs. My reduce:
#!/usr/bin/env python
import sys
import string
currentkey = None
ny = 0
other = 0
# input comes from STDIN (stream data that goes to the program)
for line in sys.stdin:
#Remove leading and trailing whitespace
line = line.strip()
#Get key/value
key, values = line.split('\t', 1)
values = int(values)
#If we are still on the same key...
if key == 'NY':
ny = ny + 1
#Otherwise, if this is a new key...
else:
#If this is a new key and not the first key we've seen
other = other + 1
#Compute/output result for the last key
print('{0}\t{1}'.format('NY',ny))
print('{0}\t{1}'.format('Other',other))
From these, the mapreduce will give two output result files, each contains both NY and Others outputs. ie one contains: NY 1248, Others 4677; another one: NY 0, Others 1000. This is because two reduced split the output from the map, so generated two results, by combining (merge) the final output will be the result.
However, I would like to change my reduce or map functions to have each reduced process on only one key, ie one reduced only deal with NY as the key values, and another one works on Other. I expect to have results like one contains:
NY 1258, Others 0; Another: NY 0, Others 5677.
How can I adjust my functions to achieve results I expect?
Probably you need to use Python iterators and generators. An excellent example is given this link . I have tried re-writing your code with the same (not tested)
Mapper:
#!/usr/bin/env python
"""A more advanced Mapper, using Python iterators and generators."""
import sys
def main(separator='\t'):
reader = csv.reader(sys.stdin, delimiter=',')
for entry in reader:
if len(entry) == 22:
registration_state=entry[16]
print '%s%s%d' % (registration_state, separator, 1)
if __name__ == "__main__":
main()
Reducer:
!/usr/bin/env python
"""A more advanced Reducer, using Python iterators and generators."""
from itertools import groupby
from operator import itemgetter
import sys
def read_mapper_output(file, separator='\t'):
for line in file:
yield line.rstrip().split(separator, 1)
def main(separator='\t'):
for current_word, group in groupby(data, itemgetter(0)):
try:
total_count = sum(int(count) for current_word, count in group)
print "%s%s%d" % (current_word, separator, total_count)
except ValueError:
# count was not a number, so silently discard this item
pass
if __name__ == "__main__":
main()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.