I'm trying to write a map-reduce function in python. I have a file that contains product information and I want to count the number of products that are members of the same category and have the same version. like this: <category, {count, version} >
My file information is as follows:
product_name rate category id version
a "3.0" cat1 1 1
b "2.0" cat1 2 1
c "4.0" cat1 3 4
d "1.0" cat2 3 2
. . . . .
. . . . .
. . . . .
for example:
<cat1, {2, 1} >
I wrote this code but in combiner function I don't know how can I count them.
from mrjob.job import MRJob
from mrjob.step import MRStep
class MRFrequencyCount(MRJob):
def steps(self):
return [
MRStep(
mapper=self.mapper_extract_words,
combiner=self.combine_word_counts,
)
]
def mapper_extract(self, _, line):
(product_name, rate, category, id, version) = line.split('*')
yield category, (1, version)
def combine_counts(self, category, countAndVersion):
yield category, sum(countAndVersion)
if __name__ == '__main__':
MRFrequencyCount.run()
The issue is the key you are creating. Since you are essentially grouping by Category and Version you should send that as the composite key to the combiner
function. The reducer
can then break down the composite key and emit the data in the desired format.
from mrjob.job import MRJob
from mrjob.step import MRStep
class MRFrequencyCount(MRJob):
def steps(self):
return [
MRStep(
mapper=self.mapper_extract,
combiner=self.combine_counts,
reducer=self.reduce_counts
)
]
def mapper_extract(self, _, line):
(product_name, rate, category, id, version) = line.split('*')
yield (category, version), 1
def combine_counts(self, cat_version, count):
yield cat_version, sum(count)
def reduce_counts(self, cat_version, counts):
category, version = cat_version
final = sum(counts)
yield category, (final, version)
if __name__ == '__main__':
MRFrequencyCount.run()
a*3.0*cat1*1*1
b*2.0*cat1*2*1
c*4.0*cat1*3*4
d*1.0*cat2*3*2
"cat2" [1, "2"]
"cat1" [1, "4"]
"cat1" [2, "1"]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.