简体   繁体   English

使用正则表达式在文件中查找日期字符串

[英]Using regex to find date string in file

I need to find a specific date string in a text file. 我需要在文本文件中找到特定的日期字符串。 There currently are two date strings in the file - "Due Date: 01/26/2016" and "Date: 01/252016". 文件中当前有两个日期字符串-“到期日期:01/26/2016”和“日期:01/252016”。 I need to find the second one but my current code only finds the first one. 我需要找到第二个,但是我当前的代码只能找到第一个。 I am guessing regex would be a better implementation but not sure how to code for it. 我猜正则表达式将是一个更好的实现,但不确定如何为其编写代码。

Current code - 当前代码-

searchString = "Date:";
if (fileContents.IndexOf(searchString) > 0)
{
    string tmp = fileContents.Substring(fileContents.IndexOf(searchString) + searchString.Length).Trim();
    string loan_date = tmp.Substring(0, tmp.IndexOf('\r')).Trim();
    if (loan_date.Count(x => x == '/') == 1)
    {
        StringBuilder sb = new StringBuilder(loan_date);
        sb[sb.Length - 4] = '/';
        loan_date = sb.ToString();
    }
    DateTime dt = DateTime.ParseExact(loan_date, "M/d/yyyy", System.Globalization.CultureInfo.InvariantCulture);
    return dt;
}

In C#, you can find matches to a regex by doing something like the following. 在C#中,您可以通过执行以下操作找到与正则表达式的匹配项。

using System;
using System.Text.RegularExpressions;

public class Example
{
    public static void Main()
    {
        string pattern = "[0-1]?[0-9]/[0-9]{2}/[0-9]{4}";
        string input = "Due Date: 01/26/2016 Date: 01/25/2016";

        foreach (var m in Regex.Matches(input, pattern)) {
            Console.WriteLine("'{0}' found at index {1}.", 
                       m.Value, m.Index);
        }
    }
}

That regex specifically means 0 or 1 (optional) followed by a digit, followed by a slash, followed by two digits, followed by a slash, followed by four digits. 该正则表达式特别表示0或1(可选),后跟一个数字,后跟一个斜杠,然后是两个数字,然后是一个斜杠,然后是四个数字。

I'm also assuming your second date 01/252016 contains a typo. 我还假设您的第二个约会01/252016包含错字。

Try this Regex: 试试这个正则表达式:

(Due\s)?(Date:)\s([0-1][0-2])\/([0-3][0-9])\/([0-2][0-9]{3})

Since both strings include "Date", we can use that to further filter out other strings (you might not actually want all dates). 由于两个字符串都包含“日期”,因此我们可以使用它进一步过滤掉其他字符串(您可能实际上并不需要所有日期)。 Since Due is optional, we can mark it as so. 由于Due是可选的,因此我们可以将其标记为。 It's a little tough to filter out poorly formatted dates, but you can limit a few things (like I have above). 过滤掉格式不正确的日期有些困难,但是您可以限制一些事情(例如我上面提到的)。 You will have to validate the date separately just to be sure. 您必须单独确认日期才能确定。

Here is a Regex that will not care about the checks as long as it's formatted correctly: 这是一个正则表达式,只要格式正确,它就不会关心检查:

(Due\s)?(Date:)\s([0-9]{2})\/([0-9]{2})\/([0-9]{4})

Or just the dates: 或者只是日期:

([0-9]{2})\/([0-9]{2})\/([0-9]{4})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM