简体   繁体   English

C#列表<match> .AddRange() 非常慢</match>

[英]C# List<Match>.AddRange() very slow

(Problem solved. See my answer bellow.) (问题已解决。请参阅下面的答案。)

I just did a profile for my project(winform / C#) because I felt that it worked much slower than before.我刚刚为我的项目(winform / C#)做了一个配置文件,因为我觉得它的工作速度比以前慢得多。 It is strange that List.AddRange() costs 92% of the total profiling process.奇怪的是 List.AddRange() 花费了整个分析过程的 92%。

Code1: With the following code, it takes 2m30s to finish a scan job(not in profiling mode): Code1:使用以下代码,完成一次扫描作业需要 2m30s(非 profiling 模式):

        var allMatches = new List<Match>();
        foreach (var typedRegex in Regexes)
        {
            var ms = typedRegex.Matches(text); //typedRegex is just Regex.
            allMatches.AddRange(ms);
        }

Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category |||||||||||||||| Function 名称 总 CPU [unit, %] 自 CPU [unit, %] 模块类别 ||||||||||||||| - [External Call] System.Collections.Generic.List.InsertRange(int, System.Collections.Generic.IEnumerable<.0>) 146579 (92.45%) 146579 (92.45%) Multiple modules IO | - [External Call] System.Collections.Generic.List.InsertRange(int, System.Collections.Generic.IEnumerable<.0>) 146579 (92.45%) 146579 (92.45%) Multiple modules IO | Kernel Kernel

Code2: So I removed the AddRange, and it costs only 1.6s: Code2:所以我去掉了AddRange,它只需要1.6s:

        var allMatches = new List<Match>();
        foreach (var typedRegex in Regexes)
        {
            var ms = typedRegex.Matches(text);
            // allMatches.AddRange(ms);
        }

Code3: Thinking that there might be some kind of "lazy load" mechanism, I added a counter to trigger the Regex.Maches(). Code3:考虑到可能存在某种“延迟加载”机制,我添加了一个计数器来触发 Regex.Maces()。 And the value of the counter is displayed in the UI.并且计数器的值显示在 UI 中。 Not it takes 9s:不需要9s:

        public static int Count = 0;
        var allMatches = new List<Match>();
        foreach (var typedRegex in Regexes)
        {
            var ms = typedRegex.Matches(text);
            // allMatches.AddRange(ms);
            Count += ms.Count;
        }

Code4: Noticing the value of Count is 32676, so I pre-allocated memories for the list. Code4:注意到 Count 的值为 32676,所以我为列表预先分配了内存。 Now it still costs 9s:现在它仍然花费 9s:

        public static int Count = 0;
        var allMatches = new List<Match>(33000);
        foreach (var typedRegex in Regexes)
        {
            var ms = typedRegex.Matches(text);
            // allMatches.AddRange(ms);
            Count += ms.Count;
        }

Code5: Thinking List.AddRange(MatchCollection) might sound strange, I changed the code to foreach(...) {List.Add(match)}, but nothing happened, 2m30s. Code5:思考 List.AddRange(MatchCollection) 可能听起来很奇怪,我将代码更改为 foreach(...) {List.Add(match)},但什么也没发生,2 分 30 秒。 The profile says Function Name Total CPU [unit, %] Self CPU [unit, %] Module Category ||||||||||||||||配置文件显示 Function 名称总 CPU [unit, %] Self CPU [unit, %] Module Category ||||||||||||||| - [External Call] System.Text.RegularExpressions.MatchCollection.MatchCollection+Enumerator.MoveNext() 183804 (92.14%) 183804 (92.14%) Multiple modules IO | - [外部调用] System.Text.RegularExpressions.MatchCollection.MatchCollection+Enumerator.MoveNext() 183804 (92.14%) 183804 (92.14%) 多个模块 IO | Kernel Kernel

Code6: SelectMany cost 2m30s as well. Code6:SelectMany 也需要 2m30s。 It's my oldest solution.这是我最古老的解决方案。

    var allMatches = Regexes.SelectMany(i => i.Matches(text)); 

So, maybe creating a list up to 32676 items is a big deal, but 10 times more than creating those Match is out of imagination.所以,也许创建一个多达 32676 个项目的列表是一件大事,但比创建这些 Match 多 10 倍是超乎想象的。 It cost 27s to finish the job just 1 day before.仅在 1 天前完成这项工作需要 27 秒。 I made a lot of changes today, and thought the profiler would tell me why.我今天做了很多更改,并认为分析器会告诉我原因。 But it didn't.但它没有。 That AddRange() was there 1 month before. AddRange() 1 个月前就在那里。 I can barely remember it's name from any profiles before.我几乎记不起它以前从任何个人资料中的名字。

I will try to remember what happened during the day.我会尽量记住白天发生的事情。 But could anybody explain the profile result above?但是有人可以解释上面的配置文件结果吗? Thanks for any help.谢谢你的帮助。

Finally, it's not a problem of AddRange(), but the Regex.Matches().最后,不是 AddRange() 的问题,而是 Regex.Matches() 的问题。 Time cost dropped from 2m30s to less 11s, after I optimized the regex.在我优化正则表达式后,时间成本从 2 分 30 秒下降到 11 秒以下。

First of all, Regex.Matches() IS using some kind of Lazy Load (and multi-threads ).首先, Regex.Matches() 是使用某种延迟加载(和多线程)。 That's why it returns MatchCollection rather than a normal list.这就是它返回 MatchCollection 而不是普通列表的原因。 MatchCollection creates a item only when you use the item. MatchCollection 仅在您使用项目时创建项目。

MatchCollection.Count() costs less than ToArray(), just like IEnumerable.Count() costs less than IEnumerable.ToArray() (less garbage collected?). MatchCollection.Count() 的成本低于 ToArray(),就像 IEnumerable.Count() 的成本低于 IEnumerable.ToArray()(收集的垃圾更少?)。

Here is code from MatchCollection:这是来自 MatchCollection 的代码:

private Match GetMatch(int i)
{
  if (this._matches.Count > i)
    return this._matches[i];
  if (this._done)
    return (Match) null;
  Match match;
  do
  {
    match = this._regex.Run(false, this._prevlen, this._input, 0, this._input.Length, this._startat);
    if (!match.Success)
    {
      this._done = true;
      return (Match) null;
    }
    this._matches.Add(match);
    this._prevlen = match.Length;
    this._startat = match._textpos;
  }
  while (this._matches.Count <= i);
  return match;
}

And it's so lazy that if you ask for the 2nd item, it never works on the third.而且它太懒了,如果你要求第二个项目,它永远不会在第三个项目上起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM