简体   繁体   中英

Extracting dates from string

I have a list with file names that look roughly like this: Gadget1-010912000000-020912235959.csv, ie they contain two dates indicating the timespan of their data.

The user enters a date format and a file format:

  • File Format in this case: *GADGET*-*DATE_FROM*-*DATE_TO*.csv
  • Date format in this case: ddMMyyHHmmss

What I want to do is extracting the three values out of the file name with the given file and date format.

My problem is: Since the date format can differ heavily (hours, minutes and seconds can be seperated by a colon, dates by a dot,...) I don't quite know how to create a fitting regular expression.

You can use a regular expression to remove non digits characters, and then parse value.

DateFormat dateFormat = new SimpleDateFormat("ddMMyyHHmmss");

String[] fileNameDetails = ("Gadget1-010912000000-020912235959").split("-");

/*Catch All non digit characters and removes it. If non exists maintains original string*/
String date = fileNameDetails[1].replaceAll("[^0-9]", "");

try{
    dateFormat.parse(fileNameDetails[1]);
}catch (ParseException e) {
}

Hope it helps.

SimpleDateFormat solves your issue. You can define the format with commas, spaces and whatever and simply parse according to the format:

http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

So you map your format (eg ddMMyyHHmmss) to a corresponding SimpleDateFormat.

SimpleDateFormat format = new SimpleDateFormat("ddMMyyHHmmss");
Date x = format.parse("010912000000");

If the format changes, you simply change the SimpleDateFormat

You can use a series of date-time formats, trying each until one works.

You may need to order the formats to prioritize matches.

For example, with Joda time, you can use DateTimeFormat.forPattern() and DateTimeFormatter.getParser() for each of a series of patterns. Try DateTimeParser.parseInto() until one succeeds.

One nice thing about this approach is that it is easy to add and remove patterns.

Use Pattern and Matcher class.

Look at the example:

String inputDate = "01.09.12.00:00:00";
Pattern pattern = Pattern.compile(
  "([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})");
  Matcher matcher = pattern.matcher(inputDate);
  matcher.find();
  StringBuilder cleanStr = new StringBuilder();
  for(int i = 1; i <= matcher.groupCount(); i++) {
    cleanStr.append(matcher.group(i));
  } 
  SimpleDateFormat format = new SimpleDateFormat("ddMMyyHHmmss");
  Date x = format.parse(cleanStr.toString());
  System.out.println(x.toString());

The most important part is line

Pattern pattern = Pattern.compile(
  "([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})[:]{0,1}([0-9]

Here you define regexp and mark groups in paranthesis so ([0-9]{2}) marks a group. Then is expression for possible delimeters [\\\\.]* in this case 0 or 1 dot, but you can put more possible delimeters for example [\\\\.|\\]{0,1} .

Then you run matcher.find() which returns true if pattern matches. And then using matcher.group(int) you can get group by group. Note that index of first group is 1.

Then I construct clean date String using StringBuilder . And then parse date.

Cheers, Michal

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM