I am trying to read from multiple HDFS .gz files, but I only want those with yesterday's date as the start of the filename. My files look like this:
/notmy-data/openSourceDatasets/Temperatures/2013-06-10T133006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T153006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T173006.gz
/notmy-data/openSourceDatasets/Temperatures/2013-06-11T193006.gz
This is what I have...
DateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
Calendar cal = Calendar.getInstance();
cal.add(Calendar.DATE, -1);
String yesterdate = dateFormat.format(cal.getTime());
Path tpath = new Path("/notmy-data/openSourceDatasets/Temperatures/" + yesterdate + "*");
FileStatus[] status = fileSystem.listStatus(tpath);
System.out.println("["+new Date()+"] Starting tempertaure ingest...");
for (int i=0;i<status.length;i++){
BufferedReader reader =new BufferedReader(new InputStreamReader(new GZIPInputStream(fileSystem.open(status[i].getPath()))));
String line;
while (null != (line = reader.readLine())){
System.out.println(line);
}
I tried this with and without the star. I always get a java.io.FileNotFoundException
. What am I doing wrong??
This probably not the best way to do it, but it works....
Path tpath = new Path("/notmy-data/openSourceDatasets/Temperatures/"); ****changed****
FileStatus[] status = fileSystem.listStatus(tpath);
System.out.println("["+new Date()+"] Starting tempertaure ingest...");
for (int i=0;i<status.length;i++){
****added****
String[] fileNameBits = status[i].getPath().toString().split("/");
String fileDate = fileNameBits[fileNameBits.length - 1].split("T")[0];
String yesterString = yesterdate.toString();
if (!fileDate.equals(yesterString)){
continue;
}
****to here****
BufferedReader reader =new BufferedReader(new InputStreamReader(new GZIPInputStream(fileSystem.open(status[i].getPath()))));
String line;
while (null != (line = reader.readLine())){
System.out.println(line);
}
Use Files.newDirectoryStream()
:
// Does globbing for you!
final DirectoryStream<Path> dirstream
= Files.newDirectoryStream(Paths.get("yourBaseDir"), yesterdate + '*');
for (final Path path: dirstream)
// do stuff with "path"
The real answer however will have to wait until you tell what that FileStatus
is...
Also, to open a new BufferedReader
on a Path
object, it is MUCH easier than what you do: use Files.newBufferedReader()
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.