简体   繁体   English

如何以最佳方式解析此字符串

[英]How to parse this string in the best way

I have several strings that looks like this one: 我有几个看起来像这样的字符串:

\r\n\t\StaticWord1:\r\n\t\t2014-05-20 11:03\r\n\t\StaticWord2\r\n\t\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t\t\t\t\tWordC WordD\r\n\t\t\t\t\t\t\t\t\r\n\t\t\r\n\t\t\r\n\t\t\r\n\t

I would like to get the date ( 2014-05-20 11:03 in my example - but will vary), Word C and D . 我想获取日期(示例中为2014-05-20 11:03但会有所不同), Word CD (Both C and D can be any sequence of letters). CD都可以是任何字母序列)。

How would I parse this as efficient as possible? 我将如何解析尽可能高的效率? I was thinking about using the String.Replace method but I think a regex would be better? 我当时在考虑使用String.Replace方法,但我认为使用正则表达式会更好? (C#) (C#)

Use this capture string : 使用以下捕获字符串:

Match match = Regex.Match(input,  @"(\d\d\d\d-\d\d-\d\d \d\d:\d\d)",
    RegexOptions.Multiline);
if (match.Success)
{
    string key = match.Groups[1].Value;
    DateTime date = DateTime.ParseExact(key, "yyyy-MM-dd HH:mm", CultureInfo.InvariantCulture); // Your result is here
}

I don't know if it's the best way but you can use a split like in this msdn example : http://msdn.microsoft.com/en-us/library/ms228388.aspx 我不知道这是否是最好的方法,但是您可以在此msdn示例中使用拆分: http : //msdn.microsoft.com/zh-cn/library/ms228388.aspx

With this example you can easily create an array like in the example and split your string with \\t \\n \\r ... and with a loop get all your words : 在这个例子中,您可以像在例子中那样轻松地创建一个数组,并用\\ t \\ n \\ r ...分割字符串,并使用循环获取所有单词:

class TestStringSplit
{
    static void Main()
    {
        char[] delimiterChars = { '\r', '\n', '\t' };

        string text = "\r\n\t\StaticWord1:\r\n\t\t2014-05-20 11:03\r\n\t\StaticWord2\r\n\t\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t\t\t\t\tWordC WordD\r\n\t\t\t\t\t\t\t\t\r\n\t\t\r\n\t\t\r\n\t\t\r\n\t";
        System.Console.WriteLine("Original text: '{0}'", text);

        string[] words = text.Split(delimiterChars);
        System.Console.WriteLine("{0} words in text:", words.Length);

        foreach (string s in words)
        {
            System.Console.WriteLine(s);
        }

        // Keep the console window open in debug mode.
        System.Console.WriteLine("Press any key to exit.");
        System.Console.ReadKey();
    }
}
string input = @"\r\n\t\StaticWord1:\r\n\t\t2014-05-20 11:03\r\n\t\StaticWord2\r\n\t\t\r\n\t\t\r\n\t\t\t\r\n\t\t\t\r\n\t\t\t\t\t\t\t\t\tWordC WordD\r\n\t\t\t\t\t\t\t\t\r\n\t\t\r\n\t\t\r\n\t\t\r\n\t"; 

string pattern = @"(\d{4}\-\d{2}\-\d{2}\s\d{2}:\d{2})(?:[\\r\\n\\t]*StaticWord2[\\r\\n\\t]*)(\w+)\s(\w+)";

Match match = Regex.Match(input, pattern);

Then to get the values: 然后获取值:

match.Groups[1].Value;  // date-time
match.Groups[2].Value;  // WordC
match.Groups[3].Value;  // WordD

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM