简体繁体中英

Add input data on the fly to Hadoop Map-Reduce Job?

原文 2015-01-26 17:33:39 1 2 java/ hadoop/ hdfs

我可以在运行时将输入文件或输入数据追加到map-reduce作业中而不创建竞争条件吗？

2 answers

I think in theory you can add more files into the input as long as it:

Matches your FileInputFormat pattern
Happens before InputFormat.getSplits() call which really gives you a very short time after you submit a job.

Regarding the race condition after splits are computed, note that append to existing files is only available since the version 0.21.0 .

And even if you can modify your files, your split points already precomputed and most likely your new data will not be picked up by mappers. Though, I doubt that it will lead to a crash of your flow.

What you can experiment with is to disable splits within a file (that is assign a mapper per file) and try to append. I think some data that had a chance to get flushed may end up in a mapper (that's just my wild guess).

Effectively the answer is "no". The splits are computed very early in the game: and after that your new files will not be included.

Running a Hadoop Map-Reduce Job

Grouping joined data in Hadoop map-reduce

Use MongoDB as I/O for hadoop map-reduce job

Runnning Hadoop map-reduce job remotely causes EOFException?

Hadoop Map-Reduce . RecordReader

Hadoop map-reduce programming

How to add external library to Hadoop map-reduce task

data from mutiple mysql tables to hadoop map-reduce

Running a local hadoop map-reduce does not partition data as expected

How to pass multiple input format files to map-reduce job?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Running a Hadoop Map-Reduce Job Grouping joined data in Hadoop map-reduce Use MongoDB as I/O for hadoop map-reduce job Runnning Hadoop map-reduce job remotely causes EOFException? Hadoop Map-Reduce . RecordReader Hadoop map-reduce programming How to add external library to Hadoop map-reduce task data from mutiple mysql tables to hadoop map-reduce Running a local hadoop map-reduce does not partition data as expected How to pass multiple input format files to map-reduce job?

Related Tags

Add input data on the fly to Hadoop Map-Reduce Job?

Question

2 answers

solution1
1 2015-01-26 17:53:35

solution2
1 2015-01-28 10:23:48

Add input data on the fly to Hadoop Map-Reduce Job?

Question

2 answers

solution1 1 2015-01-26 17:53:35

solution2 1 2015-01-28 10:23:48

solution1
1 2015-01-26 17:53:35

solution2
1 2015-01-28 10:23:48