简体   繁体   中英

Understanding flow of execution of Python code

I'm trying to do home assignment connected with python from Data Manipulation at Scale: Systems and Algorithms at Curesra. Generally I have problems with understanding base code which was presented as an example of MapReduce alogorythm. I would be grateful for helping me understand it in 2 places, details below.

I tired to go step by step through code flow of below two files after running command:

python wordcount.py 'data/books.json'
  1. File wordcount.py is opened
  2. mr = MapReduce.MapReduce() - me object is created
  3. def __init__(self): part from MapReduce.py is executed
  4. We come back to wordcount.py
  5. Functions def mapper(record): and def reducer(key,list_of_values): are created but for the time being without execution
  6. Python go to if __name__ == '__main__':
  7. ` inputdata = open(sys.argv[1]) - json file is assigned to a variable
  8. mr.execute(inputdata, mapper, reducer) - A call to the function from MapReduce.py .

And here is my first question we haven't deffined mapper or reducer variable/object so far. Is it just null/no value passed to this function or we somehow defined this variable before but I missed this?

  1. Later me move to def execute(self, data, mapper, reducer): in MapReduce.py
  2. And there we have mapper(record) .

So this is reference to a function in wordcount.py, am I right? But if we have reference to a function in different file shouldn't we use import at the beginning of the file and define from which file this function came?

(...) further code execution

wordcount.py file:

import MapReduce
import sys

"""
Word Count Example in the Simple Python MapReduce Framework
"""

mr = MapReduce.MapReduce()

# =============================
# Do not modify above this line

def mapper(record):
    # key: document identifier
    # value: document contents
    key = record[0]
    value = record[1]
    words = value.split()
    for w in words:
      mr.emit_intermediate(w, 1)

def reducer(key, list_of_values):
    # key: word
    # value: list of occurrence counts
    total = 0
    for v in list_of_values:
      total += v
    mr.emit((key, total))

# Do not modify below this line
# =============================
if __name__ == '__main__':
  inputdata = open(sys.argv[1])
  mr.execute(inputdata, mapper, reducer)

MapReduce.py file:

import json

class MapReduce:
    def __init__(self):
        self.intermediate = {}
        self.result = []

    def emit_intermediate(self, key, value):
        self.intermediate.setdefault(key, [])
        self.intermediate[key].append(value)

    def emit(self, value):
        self.result.append(value) 

    def execute(self, data, mapper, reducer):
        for line in data:
            record = json.loads(line)
            mapper(record)

        for key in self.intermediate:
            reducer(key, self.intermediate[key])

        #jenc = json.JSONEncoder(encoding='latin-1')
        jenc = json.JSONEncoder()
        for item in self.result:
            print jenc.encode(item)

Thank you in advance for help with that.

In python everything is a object, that include functions, so you can pass a functionA as argument to another functionB (or class or whenever), and if functionB expect that you to do it, it will assume that you give it a functions with the right firm and a proceed as normal.

In yours case

mr.execute(inputdata, mapper, reducer)

here mapper , reducer are the functions previously defined that are passed as argument to the method execute of the instance mr of the class MapReduce and as you can see, said method use it as the functions that it expect.

Thank to this you can, as the that code show, make generic code that do some calculus that can be used in similar way by many applications by given the user the options of supplies his/her own functions.

A much more generic example of this is the function map , this function receive a function that do something, map don't care what it does or where it comefrom, only that receive as many argument as map himself receive (others that say functions) and return a value to build a new list with the results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM