简体   繁体   中英

Hadoop MapReduce over multiple inputs

I would like to make use of multiple input formats in a single Job. I have used org.apache.hadoop.mapreduce.lib.input.MultipleInputs however this utility seems to only be designed for inputs that exist on HDFS (have a Path).

Is there a way to use multiple input formats from disparate sources?

My specific need is as follows...

I would like to have a single job that performs a reduce side join from an existing elastic search index (utilizing the ESInputFormat provided by https://github.com/elasticsearch/elasticsearch-hadoop ) with a set of sequence files that contain information to be indexed. I would like to read from these multiple inputs merge to the reduce phase and insert into another index (with some additional logic) for later use.

Suggestions?

You can still use MultipleInputs and just pass in a non null Path. It doesn't need to point to a valid location to still work, it just can't be null.

This is ok I suppose.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM