如何从URL格式解析日期？

Question

我的数据库包含存储为文本字段的URL，并且每个URL都包含报告日期的表示形式，而报告本身缺少该日期。

因此，我需要将日期从URL字段解析为String表示形式，例如：

2010-10-12
2007-01-03
2008-02-07

提取日期的最佳方法是什么？

有些格式如下：

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html

http://e.com/data/invoices/2010/09/invoices-report-thursday-september-2-2010.html

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html

http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html

http://e.com/data/invoices/2010/08/invoices-report-monday-august-30th-2010.html

http://e.com/data/invoices/2009/05/invoices-report-friday-may-8th-2009.html

http://e.com/data/invoices/2010/10/invoices-report-wednesday-october-6th-2010.html

http://e.com/data/invoices/2010/09/invoices-report-tuesday-september-21-2010.html

请注意，在以下两种情况下，在一天的第二天后不一致使用th ：

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html

http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html

其他格式则是这种格式 （日期开始前带有三个连字符，结尾处没有年份，并且可以选择使用invoices-在report之前）：

http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-1.html

http://e.com/data/invoices/2010/09/invoices-report---thursday-september-2.html

http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-15.html

http://e.com/data/invoices/2010/09/invoices-report---monday-september-13.html

http://e.com/data/invoices/2010/08/report---monday-august-30.html

http://e.com/data/invoices/2009/05/report---friday-may-8.html

http://e.com/data/invoices/2010/10/report---wednesday-october-6.html

http://e.com/data/invoices/2010/09/report---tuesday-september-21.html

Answer 1

您想要这样的正则表达式：

"^http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})"

这利用了URL的/ year / month /部分中的所有内容始终是相同的，并且直到一个月的一天都没有数字。 有了这些之后，您将不再关心其他任何事情。

第一个捕获组是年份，第二个捕获组是月份，第三个捕获组是一天。 这一天可能没有前导零； 从字符串转换为整数并根据需要设置格式，或者只是获取字符串长度，如果不是两个，则将其连接为字符串“ 0”。

举个例子：

import java.util.regex.*;

class URLDate {
  public static void
  main(String[] args) {
    String text = "http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html";
    String regex = "http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(text);
    if (m.find()) {
      int count = m.groupCount();
      System.out.format("matched with groups:\n", count);
      for (int i = 0; i <= count; ++i) {
          String group = m.group(i);
          System.out.format("\t%d: %s\n", i, group);
      }
    } else {
      System.out.println("failed to match!");
    }
  }
}

给出输出：

matched with groups:
    0: http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html
    1: 2010
    2: 09
    3: 1

（请注意，要使用Matcher.matches()而不是Matcher.find() ，您必须将.*$附加到模式中，以使模式Matcher.find()整个输入字符串。）

如何从URL格式解析日期？

问题描述

1 个解决方案

解决方案1
6 已采纳 2010-10-19 17:21:35

如何从URL格式解析日期？

问题描述

1 个解决方案

解决方案1 6 已采纳 2010-10-19 17:21:35

解决方案1
6 已采纳 2010-10-19 17:21:35