[英]Can a mapper write to multiple files
I am new to Hadoop and Map reduce and I am using an old version of hadoop 0.19. 我是Hadoop和Map Reduce的新手,并且正在使用旧版本的hadoop 0.19。 I have a program that reads a file/excel and gives me the column contents as a list of places,location,names etc. 我有一个程序可以读取文件/ excel,并为我提供列内容,如位置,位置,名称等的列表。
Lets assume I have the mapper dividing my input file into 2 parts. 假设我有映射器将输入文件分为2部分。 Each of these mappers will give me a list of the above mentioned entities. 这些映射器中的每一个都会给我上述实体的列表。
My question is: 我的问题是:
Say Doc-1: 说Doc-1:
list of places from mapper1---NY,1 US,2
list of names from mapper1---James 3 ,Ron 8
list of places from mapper-2 --NY 6 UK 5
list of names from mapper 2--Kate 9
Something like this. 这样的事情。
How do I save the output from each mapper and for each type of entity as in name or place. 如何保存每个映射器的输出以及每种类型的实体的名称或位置。
How will reducer recognize and reduce only names and come up with a final list or only locations and come up with a final list pertaining to that file. reducer如何仅识别和简化名称,并提供最终列表,或仅位置,并提供与该文件有关的最终列表。
Pls help me with this and let me know any methods that help me do that in Java. 请帮我解决这个问题,并让我知道在Java中有什么方法可以帮助我做到这一点。
If this is a Map-only job, there will be the same number of output files as there are Mappers. 如果这是仅Map作业,则输出文件的数量将与Mappers相同。 If this is a MapReduce job, you can specify the number of Reducers. 如果这是MapReduce作业,则可以指定Reducers的数量。 Provide a Partitioner that sends the data from a specific Mapper to a specific Reducer. 提供一个分区程序,用于将数据从特定的Mapper发送到特定的Reducer。 If you are not sure of the number of Mappers, make the number of Reducers slightly higher that total number and only use the first n Reducers from Partitioner. 如果不确定映射器的数量,请使Reducers的数量略高于总数,并仅使用Partitioner中的前n个Reducers。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.