简体   繁体   中英

Can I iterate through HDFS files that begin with a specific string in java?

I am trying to read from multiple HDFS .gz files, but I only want those with yesterday's date as the start of the filename. My files look like this:

/notmy-data/openSourceDatasets/Temperatures/2013-06-10T133006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T153006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T173006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T193006.gz

This is what I have...

DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
Calendar cal = Calendar.getInstance();
cal.add(Calendar.DATE, -1);    
String yesterdate = dateFormat.format(cal.getTime());
Path tpath = new Path("/notmy-data/openSourceDatasets/Temperatures/" + yesterdate + "*");
FileStatus[] status = fileSystem.listStatus(tpath);
System.out.println("["+new Date()+"] Starting tempertaure ingest...");

for (int i=0;i<status.length;i++){
    BufferedReader reader =new BufferedReader(new InputStreamReader(new GZIPInputStream(fileSystem.open(status[i].getPath()))));
    String line;
while (null != (line = reader.readLine())){
        System.out.println(line);
    }

I tried this with and without the star. I always get a java.io.FileNotFoundException . What am I doing wrong??

This probably not the best way to do it, but it works....

Path tpath = new Path("/notmy-data/openSourceDatasets/Temperatures/");    ****changed****
FileStatus[] status = fileSystem.listStatus(tpath);
System.out.println("["+new Date()+"] Starting tempertaure ingest...");

for (int i=0;i<status.length;i++){
     ****added****
    String[] fileNameBits = status[i].getPath().toString().split("/"); 
String fileDate = fileNameBits[fileNameBits.length - 1].split("T")[0]; 
String yesterString = yesterdate.toString(); 
if (!fileDate.equals(yesterString)){
    continue;
}
    ****to here****
BufferedReader reader =new BufferedReader(new InputStreamReader(new GZIPInputStream(fileSystem.open(status[i].getPath()))));
String line;
while (null != (line = reader.readLine())){
    System.out.println(line);
}

Use Files.newDirectoryStream() :

// Does globbing for you!
final DirectoryStream<Path> dirstream 
    = Files.newDirectoryStream(Paths.get("yourBaseDir"), yesterdate + '*');

for (final Path path: dirstream)
    // do stuff with "path"

The real answer however will have to wait until you tell what that FileStatus is...

Also, to open a new BufferedReader on a Path object, it is MUCH easier than what you do: use Files.newBufferedReader() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM