简体   繁体   English

手工字典比.Net字典快多少?

[英]How is it possible that a handMade dictionary is way faster than .Net dictionary?

Reviewing one opensource project, I've come across one interesting Data structure: 回顾一个开源项目,我遇到了一个有趣的数据结构:

// Represents a layer of "something" that covers the map
public class CellLayer<T> : IEnumerable<T>
{
    public readonly Size Size;
    public readonly TileShape Shape;
    public event Action<CPos> CellEntryChanged = null;

    readonly T[] entries;

    public CellLayer(Map map)
        : this(map.TileShape, new Size(map.MapSize.X, map.MapSize.Y)) { }

    public CellLayer(TileShape shape, Size size)
    {
        Size = size;
        Shape = shape;
        entries = new T[size.Width * size.Height];
    }

    public void CopyValuesFrom(CellLayer<T> anotherLayer)
    {
        if (Size != anotherLayer.Size || Shape != anotherLayer.Shape)
            throw new ArgumentException(
                "layers must have a matching size and shape.", "anotherLayer");
        if (CellEntryChanged != null)
            throw new InvalidOperationException(
                "Cannot copy values when there are listeners attached to the CellEntryChanged event.");
        Array.Copy(anotherLayer.entries, entries, entries.Length);
    }

    // Resolve an array index from cell coordinates
    int Index(CPos cell)
    {
        return Index(cell.ToMPos(Shape));
    }

    // Resolve an array index from map coordinates
    int Index(MPos uv)
    {
        return uv.V * Size.Width + uv.U;
    }

    /// <summary>Gets or sets the <see cref="OpenRA.CellLayer"/> using cell coordinates</summary>
    public T this[CPos cell]
    {
        get
        {
            return entries[Index(cell)];
        }

        set
        {
            entries[Index(cell)] = value;

            if (CellEntryChanged != null)
                CellEntryChanged(cell);
        }
    }

    /// <summary>Gets or sets the layer contents using raw map coordinates (not CPos!)</summary>
    public T this[MPos uv]
    {
        get
        {
            return entries[Index(uv)];
        }

        set
        {
            entries[Index(uv)] = value;

            if (CellEntryChanged != null)
                CellEntryChanged(uv.ToCPos(Shape));
        }
    }

    /// <summary>Clears the layer contents with a known value</summary>
    public void Clear(T clearValue)
    {
        for (var i = 0; i < entries.Length; i++)
            entries[i] = clearValue;
    }

    public IEnumerator<T> GetEnumerator()
    {
        return (IEnumerator<T>)entries.GetEnumerator();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

This structure represents a matrix of type T in that, given a CPos (an X,Y) structure, it returns the T element at that position. 该结构表示类型T的矩阵,其中,给定CPos(X,Y)结构,它将在该位置返回T元素。 Here's one sample usage: 这是一个示例用法:

var dic = new CellLayer<CellInfo>(TileShape.Rectangle, new Size(1280,1280));
cellLayer[new CPos(0, 1)] = new CellInfo(0, new CPos(0, 1), false);

Internally, the CellLayer class transforms the given CPos into a int which operates as the index for the internal array. 在内部,CellLayer类将给定的CPos转换为一个int,该int用作内部数组的索引。

By the looks of how the class operates from a client-side perspective, it felt to me like a Dictionary, so I replaced the implementation. 从客户端角度看类的运行方式,对我来说,它就像一个Dictionary,因此我替换了实现。 After several runtimes tests and microbenchmarking, it turned out that using the dictionary was dozens of times slower than using the handmade CellLayer class. 经过几次运行时测试和微基准测试后,发现使用字典比使用手工CellLayer类要慢几十倍。 That surprised me. 这让我感到惊讶。 Here are the tests I did: 这是我所做的测试:

    [Test]
    public void DictionaryTest()
    {
        var dic = new Dictionary<CPos, CellInfo>(1280 * 1280);

        var watch = Stopwatch.StartNew();

        for (int i = 0; i < 1280; i++)
            for (int u = 0; u < 1280; u++)
                dic[new CPos(i, u)] = new CellInfo(0, new CPos(i, u), false);

        Console.WriteLine(watch.ElapsedTicks);
    }

    [Test]
    public void CellLayerTest()
    {
        var dic = new CellLayer<CellInfo>(TileShape.Rectangle, new Size(1280,1280));

        var watch = Stopwatch.StartNew();

        for (int i = 0; i < 1280; i++)
            for (int u = 0; u < 1280; u++)
                dic[new CPos(i, u)] = new CellInfo(0, new CPos(i, u), false);

        Console.WriteLine(watch.ElapsedTicks);
    }

I thought that .NET Collections were as optimized as possible. 我认为.NET集合已尽可能优化。 Can anyone explain to me how is it that using Dictionary is slower that using a "custom Dictionary"? 谁能向我解释使用词典要比使用“自定义词典”慢吗?

Thanks 谢谢

For the original version, you find the location of an entry by using this formula 对于原始版本,您可以使用以下公式找到条目的位置

uv.V * Size.Width + uv.U

To find the location in a dictionary 在字典中查找位置

  1. Calculate the hash code for CPos. 计算CPos的哈希码。
  2. Find the bucket in the dictionary using a modulus operation hashcode % dictionarySize 使用模数运算hashcode % dictionarySize在字典中找到存储区
  3. If the bucket isn't empty, compare the CPos you have with the CPos in that bucket. 如果存储桶不为空,请将您拥有的CPos与该存储桶中的CPos进行比较。 If they don't match you have a secondary hash code collision. 如果它们不匹配,那么您将发生辅助哈希码冲突。 Move to the next bucket and retry step 3. 移至下一个存储桶,然后重试步骤3。

If you have a primary has code collision, which is to say lots of different CPos values have the same hash code, your dictionary is going to be ridiculously slow. 如果您的主数据库有代码冲突,也就是说,许多不同的CPos值具有相同的哈希代码,那么您的字典将非常缓慢。

If you have unique hash codes, then it is probably the modulus operation that is killing performance. 如果您具有唯一的哈希码,则可能是模数运算破坏了性能。 But you would need to attach a profiler (eg Redgate ANTS) to find out for sure. 但是,您需要附加一个探查器(例如Redgate ANTS)才能确定。

A dictionary maintains an search/retrievable set of itmes (either through a hash-table, a binary tree or something similar). 字典(通过哈希表,二叉树或类似的东西)维护一个搜索/可检索的主题集。 So each Add() and each [key] implies some search (which tends to be relatively "slow"). 因此,每个Add()和每个[key]暗含一些搜索 (趋向于相对“缓慢”)。

In your case, if there is a simple mapping from a CPos to an integer (aka array index) there is no search but a direct (and fast) access to a cell of an array. 在您的情况下,如果存在从CPos到整数(即数组索引)的简单映射, CPos进行搜索,而直接(快速)访问数组的单元格。

Or, to put it simpler: Essentially you compare a hash table/binary tree against a flat array. 或更简单地说:本质上,您将哈希表/二进制树与平面数组进行了比较。


Edit 编辑

Of course both collections are rather fast and show O(1) compexity. 当然,这两个集合都相当快,并且显示出O(1)兼容性。 A lookup in a hash table is more complex than an array index operation however. 但是,哈希表中的查找比数组索引操作复杂。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM