从包含hadoop中许多文件的目录中读取特定文件

Question

我想根据文件名从hadoop中存在的文件列表中读取特定文件。 如果文件名与我的给定名称匹配，我想处理该文件数据。 这是我在map方法中尝试过的以下方法

public void map(LongWritable key,Text value,Context con) throws IOException, InterruptedException
        {
            FileSplit fs =(FileSplit) con.getInputSplit(); 
            String filename= fs.getPath().getName();
            filename=filename.split("-")[0];
            if(filename.equals("aak"))
            {
                    String[] tokens = value.toString().split("\t");
                    String name=tokens[0];
                    con.write(new Text("mrs"), new Text("filename"));
            }

        }

Answer 1

您需要编写一个自定义PathFilter实现，然后在驱动程序代码中的FileInputFormat上使用setInputPathFilter。 请查看以下链接：

https://hadoopi.wordpress.com/2013/07/29/hadoop-filter-input-files-used-for-mapreduce/

Answer 2

使用Arani建议的PathFilter ，（为此+1），或者，
如果您选择输入文件的标准只是以字符串“ aak-”开头，那么我认为，通过更改主方法（Driver类）中的输入路径，您可以轻松地执行所需的操作，如下所示：

更换：

String inputPath = "/your/input/path"; //containing the file /your/input/path/aak-00000   
FileInputFormat.setInputPaths(conf, new Path(inputPath));

有：

String inputPath = "/your/input/path"; //containing the file /your/input/path/aak-00000
FileInputFormat.setInputPaths(conf, new Path(inputPath+"/aak-*"))

从包含hadoop中许多文件的目录中读取特定文件

问题描述

2 个解决方案

解决方案1
1 已采纳 2014-12-30 10:06:19

解决方案2
1 2014-12-30 13:36:59

从包含hadoop中许多文件的目录中读取特定文件

问题描述

2 个解决方案

解决方案1 1 已采纳 2014-12-30 10:06:19

解决方案2 1 2014-12-30 13:36:59

解决方案1
1 已采纳 2014-12-30 10:06:19

解决方案2
1 2014-12-30 13:36:59