简体   繁体   English

映射器可以写入多个文件吗

[英]Can a mapper write to multiple files

I am new to Hadoop and Map reduce and I am using an old version of hadoop 0.19. 我是Hadoop和Map Reduce的新手,并且正在使用旧版本的hadoop 0.19。 I have a program that reads a file/excel and gives me the column contents as a list of places,location,names etc. 我有一个程序可以读取文件/ excel,并为我提供列内容,如位置,位置,名称等的列表。

Lets assume I have the mapper dividing my input file into 2 parts. 假设我有映射器将输入文件分为2部分。 Each of these mappers will give me a list of the above mentioned entities. 这些映射器中的每一个都会给我上述实体的列表。

My question is: 我的问题是:

  1. How do I maintain track of data and save list of places and names separately for each file from each mapper.How will reducer recognize these files and come up with consolidated list of places and another of names for each file. 如何维护数据跟踪并从每个映射器分别保存每个文件的位置和名称列表。reduce如何识别这些文件并提出合并的位置列表和每个文件的另一个名称。

Say Doc-1: 说Doc-1:

list of places from mapper1---NY,1 US,2
list of names from mapper1---James 3 ,Ron 8
list of places from mapper-2 --NY 6 UK 5
list of names from mapper 2--Kate 9

Something like this. 这样的事情。

How do I save the output from each mapper and for each type of entity as in name or place. 如何保存每个映射器的输出以及每种类型的实体的名称或位置。

How will reducer recognize and reduce only names and come up with a final list or only locations and come up with a final list pertaining to that file. reducer如何仅识别和简化名称,并提供最终列表,或仅位置,并提供与该文件有关的最终列表。

Pls help me with this and let me know any methods that help me do that in Java. 请帮我解决这个问题,并让我知道在Java中有什么方法可以帮助我做到这一点。

If this is a Map-only job, there will be the same number of output files as there are Mappers. 如果这是仅Map作业,则输出文件的数量将与Mappers相同。 If this is a MapReduce job, you can specify the number of Reducers. 如果这是MapReduce作业,则可以指定Reducers的数量。 Provide a Partitioner that sends the data from a specific Mapper to a specific Reducer. 提供一个分区程序,用于将数据从特定的Mapper发送到特定的Reducer。 If you are not sure of the number of Mappers, make the number of Reducers slightly higher that total number and only use the first n Reducers from Partitioner. 如果不确定映射器的数量,请使Reducers的数量略高于总数,并仅使用Partitioner中的前n个Reducers。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM