简体   繁体   English

如何从URL格式解析日期?

[英]How to parse a date from a URL format?

My database contains URLs stored as text fields and each URL contains a representation of the date of a report, which is missing from the report itself. 我的数据库包含存储为文本字段的URL,并且每个URL都包含报告日期的表示形式,而报告本身缺少该日期。

So I need to parse the date from the URL field to a String representation such as: 因此,我需要将日期从URL字段解析为String表示形式,例如:

2010-10-12
2007-01-03
2008-02-07

What's the best way to extract the dates? 提取日期的最佳方法是什么?

Some are in this format: 有些格式如下:

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html

http://e.com/data/invoices/2010/09/invoices-report-thursday-september-2-2010.html

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html

http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html

http://e.com/data/invoices/2010/08/invoices-report-monday-august-30th-2010.html

http://e.com/data/invoices/2009/05/invoices-report-friday-may-8th-2009.html

http://e.com/data/invoices/2010/10/invoices-report-wednesday-october-6th-2010.html

http://e.com/data/invoices/2010/09/invoices-report-tuesday-september-21-2010.html

Note the inconsistent use of th following the day of the month in cases such as these two: 请注意,在以下两种情况下,在一天的第二天后不一致使用th

http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-15-2010.html

http://e.com/data/invoices/2010/09/invoices-report-monday-september-13th-2010.html

Others are in this format (with three hyphens before the date starts, no year at the end and an optional use of invoices- before report ): 其他格式则是这种格式 (日期开始前带有三个连字符,结尾处没有年份,并且可以选择使用invoices-report之前):

http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-1.html

http://e.com/data/invoices/2010/09/invoices-report---thursday-september-2.html

http://e.com/data/invoices/2010/09/invoices-report---wednesday-september-15.html

http://e.com/data/invoices/2010/09/invoices-report---monday-september-13.html

http://e.com/data/invoices/2010/08/report---monday-august-30.html

http://e.com/data/invoices/2009/05/report---friday-may-8.html

http://e.com/data/invoices/2010/10/report---wednesday-october-6.html

http://e.com/data/invoices/2010/09/report---tuesday-september-21.html

You want a regex like this: 您想要这样的正则表达式:

"^http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})"

This exploits that everything up through the /year/month/ part of the URL is always the same, and that no number follows till the day of the month. 这利用了URL的/ year / month /部分中的所有内容始终是相同的,并且直到一个月的一天都没有数字。 After you have that, you don't care about anything else. 有了这些之后,您将不再关心其他任何事情。

The first capture group is the year, the second the month, and the third the day. 第一个捕获组是年份,第二个捕获组是月份,第三个捕获组是一天。 The day might not have a leading zero; 这一天可能没有前导零; convert from string to integer and format as needed, or just grab the string length and, if it's not two, then concatenate it to the string "0". 从字符串转换为整数并根据需要设置格式,或者只是获取字符串长度,如果不是两个,则将其连接为字符串“ 0”。

As an example: 举个例子:

import java.util.regex.*;

class URLDate {
  public static void
  main(String[] args) {
    String text = "http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html";
    String regex = "http://e.com/data/invoices/(\\d{4})/(\\d{2})/\\D+(\\d{1,2})";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(text);
    if (m.find()) {
      int count = m.groupCount();
      System.out.format("matched with groups:\n", count);
      for (int i = 0; i <= count; ++i) {
          String group = m.group(i);
          System.out.format("\t%d: %s\n", i, group);
      }
    } else {
      System.out.println("failed to match!");
    }
  }
}

gives the output: 给出输出:

matched with groups:
    0: http://e.com/data/invoices/2010/09/invoices-report-wednesday-september-1st-2010.html
    1: 2010
    2: 09
    3: 1

(Note that to use Matcher.matches() instead of Matcher.find() , you would have to make the pattern eat the entire input string by appending .*$ to the pattern.) (请注意,要使用Matcher.matches()而不是Matcher.find() ,您必须将.*$附加到模式中,以使模式Matcher.find()整个输入字符串。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM