简体繁体中英

Hadoop MapReduce over multiple inputs

原文 2014-01-23 20:35:36 1 1 java/ hadoop/ mapreduce/ elasticsearch

I would like to make use of multiple input formats in a single Job. I have used org.apache.hadoop.mapreduce.lib.input.MultipleInputs however this utility seems to only be designed for inputs that exist on HDFS (have a Path).

Is there a way to use multiple input formats from disparate sources?

My specific need is as follows...

I would like to have a single job that performs a reduce side join from an existing elastic search index (utilizing the ESInputFormat provided by https://github.com/elasticsearch/elasticsearch-hadoop ) with a set of sequence files that contain information to be indexed. I would like to read from these multiple inputs merge to the reduce phase and insert into another index (with some additional logic) for later use.

Suggestions?

1 answers

You can still use MultipleInputs and just pass in a non null Path. It doesn't need to point to a valid location to still work, it just can't be null.

This is ok I suppose.

How combiner works when we use multiple inputs in Hadoop MapReduce

running multiple MapReduce jobs in hadoop

Hadoop Mapreduce multiple Input files

Java Hadoop MapReduce Multiple Value

Hadoop multiple inputs

When to prefer Hadoop MapReduce over Spark?

multiple file output in hadoop mapreduce streaming

Java Hadoop MapReduce Multiple Keys Values

MapReduce Hadoop on Linux - Multiple data on input

Multiple files as input to Hadoop Dfs and mapreduce

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How combiner works when we use multiple inputs in Hadoop MapReduce running multiple MapReduce jobs in hadoop Hadoop Mapreduce multiple Input files Java Hadoop MapReduce Multiple Value Hadoop multiple inputs When to prefer Hadoop MapReduce over Spark? multiple file output in hadoop mapreduce streaming Java Hadoop MapReduce Multiple Keys Values MapReduce Hadoop on Linux - Multiple data on input Multiple files as input to Hadoop Dfs and mapreduce

Related Tags

Hadoop MapReduce over multiple inputs

Question

1 answers

solution1 0 ACCPTED 2014-01-28 02:29:35

solution1
0 ACCPTED 2014-01-28 02:29:35