简体   繁体   English

将 MatchCollection 转换为字符串数组

[英]Converting a MatchCollection to string array

Is there a better way than this to convert a MatchCollection to a string array?有没有比这更好的方法将 MatchCollection 转换为字符串数组?

MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
string[] strArray = new string[mc.Count];
for (int i = 0; i < mc.Count;i++ )
{
    strArray[i] = mc[i].Groups[0].Value;
}

PS: mc.CopyTo(strArray,0) throws an exception: PS: mc.CopyTo(strArray,0)抛出异常:

At least one element in the source array could not be cast down to the destination array type.源数组中的至少一个元素无法转换为目标数组类型。

Try:尝试:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();

Dave Bish's answer is good and works properly. Dave Bish 的回答很好并且工作正常。

It's worth noting although that replacing Cast<Match>() with OfType<Match>() will speed things up.值得注意的是,虽然用OfType<Match>()替换Cast<Match>() OfType<Match>()会加快速度。

Code wold become:代码会变成:

var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
    .OfType<Match>()
    .Select(m => m.Groups[0].Value)
    .ToArray();

Result is exactly the same (and addresses OP's issue the exact same way) but for huge strings it's faster.结果完全相同(并以完全相同的方式解决 OP 的问题),但对于大字符串,速度更快。

Test code:测试代码:

// put it in a console application
static void Test()
{
    Stopwatch sw = new Stopwatch();
    StringBuilder sb = new StringBuilder();
    string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";

    Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
    strText = sb.ToString();

    sw.Start();
    var arr = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
              .OfType<Match>()
              .Select(m => m.Groups[0].Value)
              .ToArray();
    sw.Stop();

    Console.WriteLine("OfType: " + sw.ElapsedMilliseconds.ToString());
    sw.Reset();

    sw.Start();
    var arr2 = Regex.Matches(strText, @"\b[A-Za-z-']+\b")
              .Cast<Match>()
              .Select(m => m.Groups[0].Value)
              .ToArray();
    sw.Stop();
    Console.WriteLine("Cast: " + sw.ElapsedMilliseconds.ToString());
}

Output follows:输出如下:

OfType: 6540
Cast: 8743

For very long strings Cast() is therefore slower.对于很长的字符串,Cast() 因此较慢。

I ran the exact same benchmark that Alex has posted and found that sometimes Cast was faster and sometimes OfType was faster, but the difference between both was negligible.我运行了与 Alex 发布的完全相同的基准测试,发现有时Cast更快,有时OfType更快,但两者之间的差异可以忽略不计。 However, while ugly, the for loop is consistently faster than both of the other two.然而,虽然丑陋,但 for 循环始终比其他两个循环都快。

Stopwatch sw = new Stopwatch();
StringBuilder sb = new StringBuilder();
string strText = "this will become a very long string after my code has done appending it to the stringbuilder ";
Enumerable.Range(1, 100000).ToList().ForEach(i => sb.Append(strText));
strText = sb.ToString();

//First two benchmarks

sw.Start();
MatchCollection mc = Regex.Matches(strText, @"\b[A-Za-z-']+\b");
var matches = new string[mc.Count];
for (int i = 0; i < matches.Length; i++)
{
    matches[i] = mc[i].ToString();
}
sw.Stop();

Results:结果:

OfType: 3462
Cast: 3499
For: 2650

One could also make use of this extension method to deal with the annoyance of MatchCollection not being generic.还可以利用这种扩展方法来解决MatchCollection不是通用的烦恼。 Not that it's a big deal, but this is almost certainly more performant than OfType or Cast , because it's just enumerating, which both of those also have to do.并不是说这有什么大不了的,但这几乎肯定比OfTypeCastOfType ,因为它只是枚举,这两者也必须这样做。

(Side note: I wonder if it would be possible for the .NET team to make MatchCollection inherit generic versions of ICollection and IEnumerable in the future? Then we wouldn't need this extra step to immediately have LINQ transforms available). (旁注:我想知道 .NET 团队是否有可能在未来让MatchCollection继承ICollectionIEnumerable通用版本?那么我们就不需要这个额外的步骤来立即提供 LINQ 转换)。

public static IEnumerable<Match> ToEnumerable(this MatchCollection mc)
{
    if (mc != null) {
        foreach (Match m in mc)
            yield return m;
    }
}

Consider the following code...考虑以下代码...

var emailAddress = "joe@sad.com; joe@happy.com; joe@elated.com";
List<string> emails = new List<string>();
emails = Regex.Matches(emailAddress, @"([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})")
                .Cast<Match>()
                .Select(m => m.Groups[0].Value)
                .ToList();

If you need a recursive capture, eg.如果您需要递归捕获,例如。 Tokenizing Math Equations:标记数学方程:

//INPUT (I need this tokenized to do math)
    string sTests = "(1234+5678)/ (56.78-   1234   )";
            
    Regex splitter = new Regex(@"([\d,\.]+|\D)+");
    Match match = splitter.Match(sTests.Replace(" ", ""));
    string[] captures = (from capture in match.Groups.Cast<Group>().Last().Captures.Cast<Capture>()
                         select capture.Value).ToArray();

...because you need to go after the last captures in the last group. ...因为您需要追踪最后一组中的最后一次捕获。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM