简体   繁体   English

Dictionary.Add的高运行时添加了大量的项目

[英]High Runtime for Dictionary.Add for a large amount of items

I have a C#-Application that stores data from a TextFile in a Dictionary-Object. 我有一个C#-Application,它将来自TextFile的数据存储在Dictionary-Object中。 The amount of data to be stored can be rather large, so it takes a lot of time inserting the entries. 要存储的数据量可能相当大,因此插入条目需要花费大量时间。 With many items in the Dictionary it gets even worse, because of the resizing of internal array, that stores the data for the Dictionary. 由于内部数组的大小调整存储了Dictionary的数据,因此在Dictionary中有许多项目会变得更糟。 So I initialized the Dictionary with the amount of items that will be added, but this has no impact on speed. 因此,我使用将要添加的项目数量初始化词典,但这对速度没有影响。

Here is my function: 这是我的功能:

private Dictionary<IdPair, Edge> AddEdgesToExistingNodes(HashSet<NodeConnection> connections)
{
  Dictionary<IdPair, Edge> resultSet = new Dictionary<IdPair, Edge>(connections.Count);

  foreach (NodeConnection con in connections)
  {
    ...
    resultSet.Add(nodeIdPair, newEdge);
  }

  return resultSet;
}

In my tests, I insert ~300k items. 在我的测试中,我插入~300k项目。 I checked the running time with ANTS Performance Profiler and found, that the Average time for resultSet.Add(...) doesn't change when I initialize the Dictionary with the needed size. 我用ANTS Performance Profiler检查了运行时间,发现当我用所需的大小初始化Dictionary时,resultSet.Add(...)的平均时间不会改变。 It is the same as when I initialize the Dictionary with new Dictionary(); 它与我用新的Dictionary()初始化Dictionary时相同; (about 0.256 ms on average for each Add). (每次添加平均约0.256毫秒)。 This is definitely caused by the amount of data in the Dictionary (ALTHOUGH I initialized it with the desired size). 这肯定是由字典中的数据量引起的(尽管我用所需的大小初始化它)。 For the first 20k items, the average time for Add is 0.03 ms for each item. 对于前20k项,每个项的Add的平均时间为0.03 ms。

Any idea, how to make the add-operation faster? 任何想法,如何使添加操作更快?

Thanks in advance, Frank 先谢谢你,弗兰克

Here is my IdPair-Struct: 这是我的IdPair-Struct:

public struct IdPair
{
  public int id1;
  public int id2;

  public IdPair(int oneId, int anotherId)
  {
    if (oneId > anotherId)
    {
      id1 = anotherId;
      id2 = oneId;
    }
    else if (anotherId > oneId)
    {
      id1 = oneId;
      id2 = anotherId;
    }
    else
      throw new ArgumentException("The two Ids of the IdPair can't have the same value.");
  }
}

Since you have a struct, you get the default implementation of Equals() and GetHashCode(). 由于您有结构,因此您将获得Equals()和GetHashCode()的默认实现。 As others have pointed out, this is not very efficient since it uses reflection, but I don't think the reflection is the issue. 正如其他人所指出的那样,这不是很有效,因为它使用反射,但我不认为反射是问题。

My guess is that your hash codes get distributed unevenly by the default GetHashCode(), which could happen, for example, if the default implementation returns a simple XOR of all members (in which case hash(a, b) == hash(b, a)). 我的猜测是你的哈希码由默认的GetHashCode()不均匀地分配,这可能发生,例如,如果默认实现返回所有成员的简单XOR(在这种情况下哈希(a,b)==哈希(b) , 一个))。 I can't find any documentation of how ValueType.GetHashCode() is implemented, but try adding 我找不到有关如何实现ValueType.GetHashCode()的任何文档,但尝试添加

public override int GetHashCode() {
    return oneId << 16 | (anotherId & 0xffff);
}

which might be better. 这可能会更好。

IdPair is a struct , and you haven't overridden Equals or GetHashCode . IdPair是一个struct ,你没有重写EqualsGetHashCode This means that the default implementation of those methods will be used. 这意味着将使用这些方法的默认实现。

For value-types the default implementation of Equals and GetHashCode uses reflection, which is likely to result in poor performance. 对于值类型, EqualsGetHashCode的默认实现使用反射,这可能导致性能不佳。 Try providing your own implementation of the methods and see if that helps. 尝试提供自己的方法实现,看看是否有帮助。

My suggested implementation, it might not be exactly what you need/want: 我建议的实现,可能不完全是你需要/想要的:

public struct IdPair : IEquatable<IdPair>
{
    // ...

    public override bool Equals(object obj)
    {
        if (obj is IdPair)
            return Equals((IdPair)obj);

        return false;
    }

    public bool Equals(IdPair other)
    {
        return id1.Equals(other.id1)
            && id2.Equals(other.id2);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            int hash = 269;
            hash = (hash * 19) + id1.GetHashCode();
            hash = (hash * 19) + id2.GetHashCode();
            return hash;
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM