简体   繁体   English

从字符串中提取日期

[英]Extracting dates from string

I have a list with file names that look roughly like this: Gadget1-010912000000-020912235959.csv, ie they contain two dates indicating the timespan of their data. 我有一个列表,其文件名大致如下所示:Gadget1-010912000000-020912235959.csv,即它们包含两个日期,指示其数据的时间跨度。

The user enters a date format and a file format: 用户输入日期格式和文件格式:

  • File Format in this case: *GADGET*-*DATE_FROM*-*DATE_TO*.csv 在这种情况下的文件格式:* GADGET *-* DATE_FROM *-* DATE_TO * .csv
  • Date format in this case: ddMMyyHHmmss 在这种情况下的日期格式:ddMMyyHHmmss

What I want to do is extracting the three values out of the file name with the given file and date format. 我要执行的操作是使用给定的文件和日期格式从文件名中提取三个值。

My problem is: Since the date format can differ heavily (hours, minutes and seconds can be seperated by a colon, dates by a dot,...) I don't quite know how to create a fitting regular expression. 我的问题是:由于日期格式可能相差很大(小时,分钟和秒可以用冒号分隔,日期可以用点分隔...),我不太了解如何创建合适的正则表达式。

You can use a regular expression to remove non digits characters, and then parse value. 您可以使用正则表达式删除非数字字符,然后解析值。

DateFormat dateFormat = new SimpleDateFormat("ddMMyyHHmmss");

String[] fileNameDetails = ("Gadget1-010912000000-020912235959").split("-");

/*Catch All non digit characters and removes it. If non exists maintains original string*/
String date = fileNameDetails[1].replaceAll("[^0-9]", "");

try{
    dateFormat.parse(fileNameDetails[1]);
}catch (ParseException e) {
}

Hope it helps. 希望能帮助到你。

SimpleDateFormat solves your issue. SimpleDateFormat解决了您的问题。 You can define the format with commas, spaces and whatever and simply parse according to the format: 您可以使用逗号,空格和其他内容定义格式,然后只需根据格式进行解析即可:

http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

So you map your format (eg ddMMyyHHmmss) to a corresponding SimpleDateFormat. 因此,您将格式(例如ddMMyyHHmmss)映射到相应的SimpleDateFormat。

SimpleDateFormat format = new SimpleDateFormat("ddMMyyHHmmss");
Date x = format.parse("010912000000");

If the format changes, you simply change the SimpleDateFormat 如果格式更改,则只需更改SimpleDateFormat

You can use a series of date-time formats, trying each until one works. 您可以使用一系列日期时间格式,尝试每种格式直到可行为止。

You may need to order the formats to prioritize matches. 您可能需要订购格式以优先匹配。

For example, with Joda time, you can use DateTimeFormat.forPattern() and DateTimeFormatter.getParser() for each of a series of patterns. 例如,对于Joda时间,可以将DateTimeFormat.forPattern()DateTimeFormatter.getParser()用于一系列模式。 Try DateTimeParser.parseInto() until one succeeds. 尝试DateTimeParser.parseInto()直到成功。

One nice thing about this approach is that it is easy to add and remove patterns. 关于此方法的一件好事是,添加和删除模式很容易。

Use Pattern and Matcher class. 使用PatternMatcher类。

Look at the example: 看例子:

String inputDate = "01.09.12.00:00:00";
Pattern pattern = Pattern.compile(
  "([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})");
  Matcher matcher = pattern.matcher(inputDate);
  matcher.find();
  StringBuilder cleanStr = new StringBuilder();
  for(int i = 1; i <= matcher.groupCount(); i++) {
    cleanStr.append(matcher.group(i));
  } 
  SimpleDateFormat format = new SimpleDateFormat("ddMMyyHHmmss");
  Date x = format.parse(cleanStr.toString());
  System.out.println(x.toString());

The most important part is line 最重要的部分是线

Pattern pattern = Pattern.compile(
  "([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[\\.]{0,1}([0-9]{2})[:]{0,1}([0-9]{2})[:]{0,1}([0-9]

Here you define regexp and mark groups in paranthesis so ([0-9]{2}) marks a group. 在这里,您可以定义正则表达式并在括号中标记组,以便([0-9]{2})标记组。 Then is expression for possible delimeters [\\\\.]* in this case 0 or 1 dot, but you can put more possible delimeters for example [\\\\.|\\]{0,1} . 然后是可能的分隔符[\\\\.]*表达式,在这种情况下为0或1点,但是您可以放置​​更多可能的分隔符,例如[\\\\.|\\]{0,1}

Then you run matcher.find() which returns true if pattern matches. 然后运行matcher.find() ,如果模式匹配,则返回true。 And then using matcher.group(int) you can get group by group. 然后使用matcher.group(int)可以按组分组。 Note that index of first group is 1. 请注意,第一组的索引是1。

Then I construct clean date String using StringBuilder . 然后,我使用StringBuilder构造干净的日期String And then parse date. 然后解析日期。

Cheers, Michal 干杯,米哈尔

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM