简体   繁体   中英

Add input data on the fly to Hadoop Map-Reduce Job?

我可以在运行时将输入文件或输入数据追加到map-reduce作业中而不创建竞争条件吗?

I think in theory you can add more files into the input as long as it:

  1. Matches your FileInputFormat pattern
  2. Happens before InputFormat.getSplits() call which really gives you a very short time after you submit a job.

Regarding the race condition after splits are computed, note that append to existing files is only available since the version 0.21.0 .

And even if you can modify your files, your split points already precomputed and most likely your new data will not be picked up by mappers. Though, I doubt that it will lead to a crash of your flow.

What you can experiment with is to disable splits within a file (that is assign a mapper per file) and try to append. I think some data that had a chance to get flushed may end up in a mapper (that's just my wild guess).

Effectively the answer is "no". The splits are computed very early in the game: and after that your new files will not be included.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM