简体   繁体   English

在C#中解析日期/时间的最宽松方法?

[英]Most loose way to parse a date/time in C#?

I'm parsing a broad range of RSS feeds - apprently they all use their own way to show the timestamp of the article. 我正在解析各种各样的RSS提要-显然,它们都使用自己的方式来显示本文的时间戳。

Now we even found one that uses a local words, like Donderdag 17 juli 2018 . 现在我们甚至找到了一个使用本地单词的单词,例如Donderdag 17 juli 2018

At the moment we have a fallback mechanism where we just fall back to DateTime.UtcNow when we can't parse the date. 目前,我们有一个后备机制,可以在无法解析日期时回退到DateTime.UtcNow。

Still I would like to make a best attempt. 我仍然想尽最大的努力。 What is the best way to really loosely parse a DateTime in C#? 在C#中真正松散地解析DateTime的最佳方法是什么? So it can handle ie: 因此它可以处理即:

  • 13-11-2018 14.32 13-11-2018 14.32
  • donderdag 13 november 2018, 14:32 donderdag 2018年11月13日,14:32
  • 13 nov 2018 2018年11月13日
  • 14:32 13.11.2018 14:32 13.11.2018
  • 2018-11-13T16:32:00+2:00 2018-11-13T16:32:00 + 2:00

etc. I know that this would not be foolproof, but still I like to make a best attempt. 等。我知道这并非万无一失,但我仍然想尽最大努力。

Is there any recommended way? 有什么建议的方法吗? Or do I have to roll my own? 还是我必须自己动手?

You could use DateTime.TryParseExact and include all the expected formats . 您可以使用DateTime.TryParseExact并包括所有期望的格式

DateTime result;
if( DateTime.TryParseExact(input, new [] {"dd-MM-yyyy HH.mm", "dddd dd MMMM yyyy, HH:mm", "more formats here"}, CultureInfo.CreateSpecificCulture("nl-NL"), DateTimeStyles.None, out result)) {
  Console.WriteLine("Succeeded " + result);
}

The only big "gotcha" here is date formats where the date and month are in ambiguous positions. 这里唯一的大“陷阱”是日期格式,其中日期和月份处于不明确的位置。 I do not see any in your example but if you were to mix cultures in one stream then it could become a problem. 我在您的示例中没有看到任何内容,但是如果您将文化混合在一起,那么可能会成为问题。 As an example the US generally starts a formatted date with the month while the Netherlands starts it with the day of the month. 例如,美国通常以月份开始一个格式化日期,而荷兰通常以月份中的一天开始。 If this is a problem there is no way to handle this dynamically in your use case above unless you also get the culture from the RSS stream in which case you could try to create a set of culture specific parsing rules. 如果这是一个问题,则除非您还从RSS流中获取区域性, 否则无法在上述用例中动态处理此问题,在这种情况下,您可以尝试创建一组区域性特定的解析规则。

This suggestion is not specific to date times, but you could try to use parser combinators , especially if you decide to roll your own solution. 该建议并非特定于日期时间,但是您可以尝试使用解析器组合器 ,尤其是当您决定推出自己的解决方案时。 There are multiple libs for .net, Sprache for example. .net有多个库,例如Sprache

Loosely parsing date times from mixed sources if data is probably not a good idea. 如果数据可能不是一个好主意,则从混合来源中松散地解析日期时间。 Some things like Microsoft's text-to-speech may try, but it can sometimes have the effect of reading consecutive dates as 诸如Microsoft的文本转语音之类的某些操作可能会尝试,但有时可能会导致读取连续的日期为

October first, November first, December first, January thirteenth, etc. 10月1日,11月1日,12月1日,1月13日,依此类推。

The only way loose parsing can be made somewhat reliable is if one can use other cues to associate dates with whatever wrote them. 使松散分析变得有些可靠的唯一方法是,是否可以使用其他提示将日期与编写的内容关联起来。 If you have a bunch of dates that occur at the top level of a particular feed, and you find that all parsing patterns that work for all of them yield the same results, then it's likely that that parsing pattern is parsing the dates correctly. 如果您在一堆特定提要的最顶层出现了一些日期,并且发现所有适用于它们的解析模式都产生相同的结果,那么该解析模式很可能正确地解析了日期。 The biggest parts of such an endeavor, however, will likely not be parsing the dates, but rather (1) ensuring that dates that are written in different formats get grouped separately, and (2) providing a means by which an operator can assist the program in places where it has trouble. 但是,此类尝试的最大部分可能不是解析日期,而是(1)确保以不同格式编写的日期分别分组,以及(2)提供一种方法,操作员可以通过该方法来协助在有麻烦的地方编程。

Incidentally, I don't know if any date parsing programs make use of attached weekdays as part of format validation, but they could often help. 顺便说一句,我不知道是否有任何日期解析程序将附加的工作日用作格式验证的一部分,但它们通常可以提供帮助。 For example, "2-1-2018" could either be January 2 or February 1, but "Thursday 2-1-2018" could only be the latter. 例如,“ 2-1-2018”可以是1月2日或2月1日,而“星期二2-1-2018”只能是后者。 It may be helpful when parsing numeric dates from a source whose format isn't fully established to determine what the weekday would be with each method of parsing and check whether the input contains something that looks like a weekday matching one but not the other. 从尚未完全确定格式的来源解析数字日期时,确定每种解析方法的工作日将是什么样,并检查输入是否包含看起来像一个工作日的内容,而另一个则不匹配,这可能会有所帮助。

You can use the TryParse method to try to parse the strings, while looping through all cultures to capture any culture differences in the string. 您可以使用TryParse方法尝试解析字符串,同时循环浏览所有区域性以捕获字符串中的任何区域性差异。 The following method will parse all standard formats for all cultures and return the date in the out parameter if it's found. 以下方法将解析所有区域性的所有标准格式,并在out参数中返回日期(如果找到)。

Note that the danger here is that some dates will have ambiguous month and day values (any number less than 13 could be a month or a day). 请注意,这里的危险是某些日期的月和日值可能不明确(小于13的任何数字都可能是一个月或一天)。 In that case, the result will be the first culture found that matches, which may not be correct. 在这种情况下,结果将是找到的第一个匹配的文化,这可能是不正确的。

Here's the code: 这是代码:

public static bool TryParseAllCultures(string formattedDate, 
    out DateTime result)
{
    // First try in our local culture
    if (DateTime.TryParse(formattedDate, out result)) return true;

    foreach (var cultureInfo in CultureInfo.GetCultures(CultureTypes.AllCultures))
    {
        if (DateTime.TryParse(formattedDate, cultureInfo, DateTimeStyles.None, 
            out result))
        {
            return true;
        }
    }

    return false;
}

Sample usage 样品用法

Note: I modified one of your dates because the date itself was invalid! 注意:我修改了您的日期之一,因为日期本身无效! The second date used to be "donderdag 13 november 2018", except the 13th is dienstag (Tuesday), not donderdag (Thursday). 第二个日期以前是“ donderdag 2018年11月13日”,除了第13个日期是dienstag(星期二),而不是donderdag(星期四)。

private static void Main()
{
    DateTime date;

    var dateFormats = new List<string>
    {
        "13-11-2018 14.32",
        "donderdag 15 november 2018, 14:32",
        "13 nov 2018",
        "14:32 13.11.2018",
        "2018-11-13T16:32:00+2:00"
    };

    DateTime result;

    foreach (var dateFormat in dateFormats)
    {
        if (TryParseAllCultures(dateFormat, out result))
        {
            Console.ForegroundColor = ConsoleColor.Green;
            Console.WriteLine($"SUCCESS: {dateFormat.PadRight(36, '.')} {result}");
        }
        else
        {
            Console.ForegroundColor = ConsoleColor.Red;
            Console.WriteLine($"ERROR: Unable to parse format: {dateFormat}");
        }

        Console.ResetColor();
    }

    GetKeyFromUser("\nDone! Press any key to exit...");
}

Output 输出量

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM