简体   繁体   English

在hadoop中实现多个映射器和单个减速器

[英]Implementing mulitple mappers and single reducer in hadoop

I am new to hadoop. 我是hadoop的新手。 I have mutiple folders containing files to processing a data in hadoop. 我有多个包含文件的文件夹来处理hadoop中的数据。 I have doubt to implement mapper in map-reducer algorithm. 我怀疑在map-reducer算法中实现mapper。 Can I specify multiple mappers for processing mulitple files and have all input files as one output using a single reducer? 我可以指定多个映射器来处理多个文件,并使用单个reducer将所有输入文件作为一个输出吗? If possible, please give guidelines for implementing the above steps. 如果可能,请提供实施上述步骤的指南。

If you have multiple files, use MultipleInputs 如果您有多个文件,请使用MultipleInputs

addInputPath() method can be used to: addInputPath()方法可用于:

  1. add multiple paths and one common mapper implementation 添加多个路径和一个公共映射器实现
  2. add multiple paths with custom mapper and input format implementation. 使用自定义映射器和输入格式实现添加多个路径。

For having a single reducer, have each maps' output key same...say 1 or "abc". 对于单个缩减器,每个映射的输出键都相同...比如1或“abc”。 This way, the framework will create only one reducer. 这样,框架将只创建一个reducer。

If the files are to be mapped in the same way (eg they all have the same format and processing requirements) then you can configure a single mapper to process all of them. 如果要以相同的方式映射文件(例如,它们都具有相同的格式和处理要求),那么您可以配置单个映射器来处理所有这些文件。

You do this by configuring the TextInputFormat class: 您可以通过配置TextInputFormat类来完成此操作:

string folder1 = "file:///home/chrisgerken/blah/blah/folder1";
string folder2 = "file:///home/chrisgerken/blah/blah/folder2";
string folder3 = "file:///home/chrisgerken/blah/blah/folder3";
TextInputFormat.setInputPaths(job, new Path(folder1), new Path(folder2), new Path(folder3));

This will result in all of the files in folders 1, 2 and 3 being processed by the mapper. 这将导致映射器处理文件夹1,2和3中的所有文件。

Of course, if you need to use a different input type you'll have to configure that type appropriately. 当然,如果您需要使用不同的输入类型,则必须适当地配置该类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM