简体   繁体   English

从Hadoop中的作业conf获取输入路径

[英]Get Input path from job conf in hadoop

I am setting a path as input location to conf 我正在将路径设置为conf的输入位置

FileInputFormat.setInputPaths(conf, new Path("path/to/folder"));

How can I retrieve this location back from conf as I am trying to implement my own RecordReader 我在尝试实现自己的RecordReader时如何从conf中检索此位置

Thanks in advance... 提前致谢...

The property set by this call is map.input.dir , so this should work for you: 通过此调用设置的属性是map.input.dir ,因此这应该对您map.input.dir

conf.get("map.input.dir");

On a side note, your record reader should act upon the input split it is given in the initialize(InputSplit, TaskAttemptContext) method, as the folder you pass in setInputPath will actually resolve to a number of input splits, typically one for each file in the folder (and possible multiple input splits for larger, splittable files). 附带一提,您的记录读取器应对initialize(InputSplit, TaskAttemptContext)方法中给定的输入拆分采取行动,因为您传入setInputPath的文件夹实际上将解析为多个输入拆分,通常每个输入拆分一个文件夹(以及较大的可拆分文件可能的多个输入拆分)。

FileInputFormat based input formats are passed a FileSplit to the initialize method, and you should be able to pull out the actual file to be processed from the FileSplit.getPath() method. 基于FileInputFormat的输入格式将FileSplit传递给initialize方法,并且您应该能够从FileSplit.getPath()方法中提取要处理的实际文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM