简体繁体中英

Why do we need setup() method in MapReduce when we can initialize parameters in map() or reduce()?

原文 2016-10-28 17:34:18 2 4 java/ hadoop/ mapreduce

I am new to Hadoop and overall MapReduce paradigm. I searched a lot on the web regarding overriding the setup() method in Map class to access the configuration object. But from what I read, it seems that the setup() method is anyways called every time a task is run.

So why is the need for a seperate method to access configuration object and initialize parameters? Why cant we do the same directly in map() or reduce() methods?

Though both the approaches will give output as required in the end, is there a performance factor that comes into picture while choosing any one approach? Thanks in advance.

4 answers

the answer lies not in Hadoop, but in programming paradigm in my opinion. It is always good to separate different parts of the business logic, and setting up the running environment is different then running the map itself.

Imagine a scenario when you have certain data on which you wish to do multiple calculations, in this case if you have a parent class for your jobs, in which you can do the common setup phases by overriding a separate method it is better.

The design just encourages this behaviour which you would choose otherwise as well.

您必须检查map()或reduce()是否已初始化参数，以便通过划分初始化和实际映射逻辑阶段来简化初始化过程。

I'm not sure if I'm right but as far as I understand map() and reduce() are executed in nodes in distributed network where nodes do not have knowledge about whole system. So what you have access inside map() reduce() methods is not what is configured in main node. You can't just have access to whole configuration in node because it means you need to connect to main node whole time.

Re: "it seems that the setup() method is anyways called every time a task is run."

Whenever a task is run, number of records are processed by the corresponding Map or Reduce task. The map() or reduce() method is called for every record being processed. However setup() method is run once per task giving you opporunity to optimize the workflow by initializing configurations/resources such as ( Database connection, reading a reference file etc.) only once per all the records being processed by that task.

Similarly, the API provides a callback named "cleanup" where you can clean up the resources. This will be invoked when the task has finished processing records allocated for that task.

Why do we need default constructor when we can initialize data members in java?

Do we need an interface/contract if we cannot generalize method parameters

Why do we need to convert objects into a Map?

Why can not we call servlet constructor instead of init method to initialize the config parameters?

Why do we need getters when we create ORM class?

Why do we need to extend when using generics in Java if we can just use the original type?

Can we put some computation task inside setup method of mapper class in mapreduce code

Do we need to initialize @mock object

Why do we need to explicitly set OutputKey/InputKey Class in MapReduce job?

Why do we need Set and Map for Java Enum

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Why do we need default constructor when we can initialize data members in java? Do we need an interface/contract if we cannot generalize method parameters Why do we need to convert objects into a Map? Why can not we call servlet constructor instead of init method to initialize the config parameters? Why do we need getters when we create ORM class? Why do we need to extend when using generics in Java if we can just use the original type? Can we put some computation task inside setup method of mapper class in mapreduce code Do we need to initialize @mock object Why do we need to explicitly set OutputKey/InputKey Class in MapReduce job? Why do we need Set and Map for Java Enum

Related Tags

Why do we need setup() method in MapReduce when we can initialize parameters in map() or reduce()?

Question

4 answers

solution1
0 2016-10-28 17:54:25

solution2
0 2016-10-28 17:55:19

solution3
0 2016-10-28 18:08:44

solution4
0 ACCPTED 2016-10-29 23:35:04

Why do we need setup() method in MapReduce when we can initialize parameters in map() or reduce()?

Question

4 answers

solution1 0 2016-10-28 17:54:25

solution2 0 2016-10-28 17:55:19

solution3 0 2016-10-28 18:08:44

solution4 0 ACCPTED 2016-10-29 23:35:04

solution1
0 2016-10-28 17:54:25

solution2
0 2016-10-28 17:55:19

solution3
0 2016-10-28 18:08:44

solution4
0 ACCPTED 2016-10-29 23:35:04