简体繁体中英

Loading same file for each mapper

原文 2014-04-02 12:08:51 4 1 java/ hadoop/ mapreduce

Let's say we have 10 data points and 5 mappers and the goal is to compute the distance between the points. Normally this takes O(N^2) by comparing each two pairs together.

What I want to do is load the whole file containing the data points to each mapper and make each mapper operate on different points. For example, let mapper #1 calculate the distance of point 1 and point 2 with all the other points, mapper #2 calculate the distance of point 3 and point 4 with all the others points and so on.

I came across this algorithm in a paper, but it had no specific way to implement it. Any ideas or suggestions on how to load the whole file to each mapper, or how to make each mapper operate on specific index through the file would be much appreciated.

1 answers

Take a look at this paper , suggesting to use the "block nested loop" join (Section 3), which is slightly different than what you ask, but can easily be extended to match your needs. If you treat both R and S as one source, then, at the end, it ends up comparing all points to all other points, as you require.

For your requirements, you don't need to implement the second MapReduce job that keeps only the top-k results.

In hadoop 1.2.0 (old API), you can get the total number of mappers by using the conf.get("mapred.map.tasks") command and the current mapper, by using the conf.get("mapred.task.partition") command.

However, to answer your question on how to get the same file for all mappers, you can use the Distributed Cache .

Loading the same file in memory for each mapper hadoop

Why must the interface and xml mapper file be in same package and have the same name?

Passing different parameters to each mapper

Mapper file is not known to the MapperRegistry

Creating a file mapper

Multiple assocations with same base mapper

How to make each hadoop mapper to get a file pair i.e. a whole input file (.csv) and a whole meta data file (.json)

Saving / Loading two HashMaps in the same file

Multiple input files for each mapper 'type'

Hadoop use one instance for each mapper

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Loading the same file in memory for each mapper hadoop Why must the interface and xml mapper file be in same package and have the same name? Passing different parameters to each mapper Mapper file is not known to the MapperRegistry Creating a file mapper Multiple assocations with same base mapper How to make each hadoop mapper to get a file pair i.e. a whole input file (.csv) and a whole meta data file (.json) Saving / Loading two HashMaps in the same file Multiple input files for each mapper 'type' Hadoop use one instance for each mapper

Related Tags

Loading same file for each mapper

Question

1 answers

solution1 0 ACCPTED 2014-04-02 12:19:05

solution1
0 ACCPTED 2014-04-02 12:19:05