简体   繁体   English

C#中的关联矩阵是否有更有效的方法?

[英]Is there a more efficient way for associative matrix in C#?

I implemented an algorithm using dictionaries with tuple keys, and the algorithm works, but it is very very slow. 我使用带有元组键的字典实现了一个算法,算法有效,但速度非常慢。 I have a set of strings. 我有一组字符串。 I was trying to implement an associative matrix where A["abc","bcde"] = 2, the amount of overlap of the two strings. 我试图实现一个关联矩阵,其中A["abc","bcde"] = 2,两个字符串的重叠量。 The tuples in L are keys in A. L is a sorted array => A[L[i]] < A[L[i+1]] I merge the two strings with the biggest overlap in the set, then I update the "matrix" and the L list. L中的元组是A中的键.L是一个有序数组=> A [L [i]] <A [L [i + 1]]我合并了两个字符串中最大的重叠,然后我更新了“矩阵”和L列表。 I do it in a loop until the set has only 1 element. 我在循环中执行它,直到该集合只有1个元素。 My problem is that with dictionary the algorithm is too slow. 我的问题是,使用字典算法太慢了。 Is there a more efficient way to do this? 有没有更有效的方法来做到这一点? Here is my code: 这是我的代码:

List<string> words = new List<string>(wordsFromFile);

Dictionary<Tuple<string, string>, int> A = new Dictionary<Tuple<string, string>, int>();
List<Tuple<string, string>> L = new List<Tuple<string,string>>();

(I used counting sort for making L. After that refreshing the matrix and the list is very time consuming:) (我使用计数排序制作L.之后刷新矩阵,列表非常耗时:)

            while (words.Count > 1)
            {
                string LastItem1 = L.Last().Item1;
                string LastItem2 = L.Last().Item2;
                words.Remove(LastItem1);
                words.Remove(LastItem2);
                string newElement = merge(LastItem1, LastItem2);
                words.Add(newElement);
                for (int i = 0; i < words.Count; ++i)
                {
                    if (words[i] == newElement)
                    {
                        Tuple<string, string> tmp = new Tuple<string, string>(newElement, newElement);
                        A[tmp] = 0;
                    }
                    else
                    {
                        Tuple<string, string> tmp = new Tuple<string, string>(newElement, words[i]);
                        A[tmp] = A[new Tuple<string, string>(LastItem2, words[i])];
                        tmp = new Tuple<string, string>(words[i], newElement);
                        A[tmp] = A[new Tuple<string, string>(words[i], LastItem1)];
                    }
                }
                var itemsToRemove = A.Where(f => f.Key.Item1 == LastItem1 || f.Key.Item1 == LastItem2 || f.Key.Item2 == LastItem1 || f.Key.Item2 == LastItem2).ToArray();
                foreach (var item in itemsToRemove)
                    A.Remove(item.Key);

                L.Remove(L.Last());
                for (int i = 0; i < L.Count(); ++i)
                {
                    if (L[i].Item1 == LastItem2 && L[i].Item2 != LastItem1 && L[i].Item2 != newElement && L[i].Item2 != LastItem2) L[i] = new Tuple<string, string>(newElement, L[i].Item2);
                    else if (L[i].Item2 == LastItem1 && L[i].Item1 != LastItem1 && L[i].Item1 != newElement && L[i].Item1 != LastItem2) L[i] = new Tuple<string, string>(L[i].Item1, newElement);
                }

                var listitemsToRemove = L.Where(f => f.Item1 == LastItem1 || f.Item2 == LastItem2 || f.Item1 == LastItem2 || f.Item2 == LastItem1).ToArray();
                foreach (var item in listitemsToRemove) L.Remove(item);
                listitemsToRemove = L.Where(f => f.Item2 == LastItem2).ToArray();

            }
  1. Its hard to read highly obfuscated code, however one thing that jumps out at me is this: 它难以阅读高度混淆的代码,但有一件事突然出现在我身上:

    L[i].Item1 L [I] .Item1

    Which is sub-optimal as compared to a dictionary. 与字典相比,这是次优的。 I imagine you might want to retain ordering, in which case you can use OrderedDictionary<> 我想你可能想保留排序,在这种情况下你可以使用OrderedDictionary <>

  2. You use a for loop which could be optimized by a foreach loop in your cases. 您使用for循环,可以通过您的情况下的foreach循环进行优化。 It is true that for loops are faster in raw performance but not in the way you are using it. 确实,for循环的原始性能更快,但不是你使用它的方式。 You do about 12 look ups on L which is a list. 你在L上做了大约12次查找,这是一个列表。 Its not an array, its a list, so picking items in the middle of a list like that is going to lose speed over time. 它不是一个数组,它是一个列表,因此在列表中间挑选项目会随着时间的推移而失去速度。 Foreach is optimized for this specific case and is faster head to head if iterating a list (unless you introduce an int counter in which case for loop is faster). Foreach针对这种特定情况进行了优化,并且如果迭代一个列表,则会更快地进行头对头(除非你引入一个int计数器,在这种情况下,循环更快)。

  3. words[i] is doing 3 lookups (inefficiently as compared to a foreach loop) where it would look it up once words [i]正在进行3次查找(与foreach循环相比效率低下),它会查找一次

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM