简体繁体 English

Hadoop MapReduce通过多个输入

[英]Hadoop MapReduce over multiple inputs

原文 2014-01-23 20:35:36 2 1 java/ hadoop/ mapreduce/ elasticsearch

I would like to make use of multiple input formats in a single Job. 我想在一个作业中使用多种输入格式。 I have used org.apache.hadoop.mapreduce.lib.input.MultipleInputs however this utility seems to only be designed for inputs that exist on HDFS (have a Path). 我已经使用了org.apache.hadoop.mapreduce.lib.input.MultipleInputs，但是此实用程序似乎仅适用于HDFS（具有路径）上存在的输入。

Is there a way to use multiple input formats from disparate sources? 有没有办法使用来自不同来源的多种输入格式？

My specific need is as follows... 我的具体需求如下...

I would like to have a single job that performs a reduce side join from an existing elastic search index (utilizing the ESInputFormat provided by https://github.com/elasticsearch/elasticsearch-hadoop ) with a set of sequence files that contain information to be indexed. 我想要一个可以从现有弹性搜索索引（利用https://github.com/elasticsearch/elasticsearch-hadoop提供的ESInputFormat）执行减少侧连接的单一作业，其中包含一组序列信息，以被索引。 I would like to read from these multiple inputs merge to the reduce phase and insert into another index (with some additional logic) for later use. 我想从这些多个输入中读取到合并到reduce阶段并插入到另一个索引（带有一些附加逻辑）中以供以后使用。

Suggestions? 建议？

1 个解决方案

You can still use MultipleInputs and just pass in a non null Path. 您仍然可以使用MultipleInputs并仅传递非null路径。 It doesn't need to point to a valid location to still work, it just can't be null. 它无需指向有效位置即可继续工作，只是不能为null。

This is ok I suppose. 我想这可以。