简体   繁体   English

Hadoop MapReduce通过多个输入

[英]Hadoop MapReduce over multiple inputs

I would like to make use of multiple input formats in a single Job. 我想在一个作业中使用多种输入格式。 I have used org.apache.hadoop.mapreduce.lib.input.MultipleInputs however this utility seems to only be designed for inputs that exist on HDFS (have a Path). 我已经使用了org.apache.hadoop.mapreduce.lib.input.MultipleInputs,但是此实用程序似乎仅适用于HDFS(具有路径)上存在的输入。

Is there a way to use multiple input formats from disparate sources? 有没有办法使用来自不同来源的多种输入格式?

My specific need is as follows... 我的具体需求如下...

I would like to have a single job that performs a reduce side join from an existing elastic search index (utilizing the ESInputFormat provided by https://github.com/elasticsearch/elasticsearch-hadoop ) with a set of sequence files that contain information to be indexed. 我想要一个可以从现有弹性搜索索引(利用https://github.com/elasticsearch/elasticsearch-hadoop提供的ESInputFormat)执行减少侧连接的单一作业,其中包含一组序列信息,以被索引。 I would like to read from these multiple inputs merge to the reduce phase and insert into another index (with some additional logic) for later use. 我想从这些多个输入中读取到合并到reduce阶段并插入到另一个索引(带有一些附加逻辑)中以供以后使用。

Suggestions? 建议?

You can still use MultipleInputs and just pass in a non null Path. 您仍然可以使用MultipleInputs并仅传递非null路径。 It doesn't need to point to a valid location to still work, it just can't be null. 它无需指向有效位置即可继续工作,只是不能为null。

This is ok I suppose. 我想这可以。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM