简体   繁体   English

c#SortedList <string,TValue> .ContainsKey表示成功添加的键返回false

[英]c# SortedList<string, TValue>.ContainsKey for successfully added key returns false

CHECK UPDATE 3 below I found out the issue I ran into is related to a known serious problem with c# string comparers for .Net 4.0, 4.0 client and 4.5, that will lead to inconsistent sort order of lists of strings (causing the output to depend on the order in the input and the sort algorithm used). 检查下面的更新3我发现我遇到的问题与.Net 4.0,4.0客户端和4.5的c#字符串比较器的已知严重问题有关,这将导致字符串列表的排序顺序不一致(导致输出依赖在输入的顺序和使用的排序算法)。 The problem was reported to Microsoft in Dec 2012 and closed as "won't be fixed". 该问题于2012年12月向微软报告,并以“不会修复”结束。 A work around is available, but is so much slower that it is hardly practical for large collections. 可以使用一种解决方法,但速度慢得多,对于大型集合来说几乎不实用。

While implementing an immutable PatriciaTrie, I wanted to compare its performance with a System.Collections.Generic.SortedList. 在实现不可变的PatriciaTrie时,我想将其性能与System.Collections.Generic.SortedList进行比较。 I used the following file https://github.com/rkapsi/patricia-trie/blob/master/src/test/resources/org/ardverk/collection/hamlet.txt to create an input wordlist for testing. 我使用以下文件https://github.com/rkapsi/patricia-trie/blob/master/src/test/resources/org/ardverk/collection/hamlet.txt来创建用于测试的输入词表。

When inserting each of the words in the c# SortedList, using either Comparer<string>.Default or StringComparer.InvariantCulture as the key comparer, a number of the entries that are successfully inserted cannot be retrieved using the normal search methods (eg ContainsKey returns false) but the key is present in the list as is observed by iterating the list. 在c#SortedList中插入每个单词时,使用Comparer<string>.DefaultStringComparer.InvariantCulture作为键比较器,无法使用常规搜索方法检索成功插入的多个条目(例如ContainsKey返回false )但是密钥存在于列表中,如通过迭代列表所观察到的那样。

Even more curious, the comparer returns the value '0' when comparing the key retrieved from the sorted list with the search key that could not be found using ContainsKey . 更奇怪的是,比较器在将从排序列表中检索到的密钥与使用ContainsKey无法找到的搜索关键字进行比较时返回值“0”。

The complete example below demonstrates this issue on my system. 下面的完整示例演示了我的系统上的这个问题。

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;

class Program
{
    static void Main(string[] args)
    {
        // the problem is possibly related to comparison.
        var fail = true;
        var comparer = fail ? StringComparer.InvariantCulture : StringComparer.Ordinal;

        // read hamlet (contains duplicate words)
        var words = File
            .ReadAllLines("hamlet.txt")
            .SelectMany(l => l.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries))
            .Select(w => w.Trim())
            .Where(w => !string.IsNullOrEmpty(w))
            .Distinct(comparer)
            .ToArray();

        // insert hamlet's words in the sorted list.
        var list = new SortedList<string, int>(comparer);
        var ndx = 0;
        foreach (var word in words)
            list[word] = ndx++;

        // search for each of the added words.
        foreach (var keyToSearch in words)
        {
            if (!list.ContainsKey(keyToSearch))
            {
                // was inserted, but cannot be retrieved.
                Console.WriteLine("Error - Key not found: \"{0}\"", keyToSearch);
                // however when we iterate over the list, we see that the entry is present
                var prefix = keyToSearch.Substring(0, Math.Min(keyToSearch.Length, 3));
                foreach (var wordCloseToSearchKey in list.Keys.Where(s => s.StartsWith(prefix)))
                {
                    // and using the SortedList's supplied comparison returns 0, signaling equality
                    var comparisonResult = list.Comparer.Compare(wordCloseToSearchKey, keyToSearch);
                    Console.WriteLine("{0} - comparison result = {1}", wordCloseToSearchKey, comparisonResult);
                }
            }
        }

        // Check that sort order of List.Keys is correct 
        var keys = list.Keys.ToArray();
        BinarySearchAll("list.Keys", keys, list.Comparer);
        CheckCorrectSortOrder("list.Keys", keys, list.Comparer);

        // Check that sort order of Array.Sort(List.Keys) is correct 
        var arraySortedKeys = CopySortSearchAndCheck("Array.Sort(List.Keys)", keys, list.Comparer);

        // Check that sort order of the Array.Sort(input words) is correct 
        var sortedInput = CopySortSearchAndCheck("Array.Sort(input words)", words, list.Comparer);

        Console.ReadLine();
    }

    static string[] CopySortSearchAndCheck(string arrayDesc, string[] input, IComparer<string> comparer)
    {
        // copy input
        var sortedInput = new string[input.Length];
        Array.Copy(input, sortedInput, sortedInput.Length);

        // sort it
        Array.Sort(sortedInput, comparer);

        // check that we can actually find the keys in the array using bin. search
        BinarySearchAll(arrayDesc, sortedInput, comparer);

        // check that sort order is correct
        CheckCorrectSortOrder(arrayDesc, sortedInput, comparer);

        return sortedInput;
    }

    static void BinarySearchAll(string arrayDesc, string[] sortedInput, IComparer<string> comparer)
    {
        // check that each key in the input can be found using bin. search
        foreach (var word in sortedInput)
        {
            var ix = Array.BinarySearch(sortedInput, word, comparer);
            if (ix < 0)
                // and it appears it cannot!
                Console.WriteLine("Error - {0} - Key not found: \"{1}\"", arrayDesc, word);
        }
    }

    static void CheckCorrectSortOrder(string arrayDesc, string[] sortedKeys, IComparer<string> comparer)
    {
        for (int n = 0; n < sortedKeys.Length; n++)
        {
            for (int up = n + 1; up < sortedKeys.Length; up++)
            {
                var cmp = comparer.Compare(sortedKeys[n], sortedKeys[up]);
                if (cmp >= 0)
                {
                    Console.WriteLine(
                        "{0}[{1}] = \"{2}\" not < than {0}[{3}] = \"{4}\"  - cmp = {5}",
                        arrayDesc, n, sortedKeys[n], up, sortedKeys[up], cmp);
                }
            }
            for (int down = n - 1; down > 0; down--)
            {
                var cmp = comparer.Compare(sortedKeys[n], sortedKeys[down]);
                if (cmp <= 0)
                {
                    Console.WriteLine(
                        "{0}[{1}] = \"{2}\" not > than {0}[{3}] = \"{4}\"  - cmp = {5}",
                        arrayDesc, n, sortedKeys[n], down, sortedKeys[down], cmp);
                }
            }
        }
    }
}

Does anyone have an explanation for this unexpected and odd behaviour? 有没有人对这种意外和奇怪的行为有解释?

When changing the comparer used by the SortedList to StringComparer.Ordinal (by eg changing fail to false in the above example) the problem disappears, which seems to point to a comparison issue, but I don't quite understand why. 当将SortedList使用的比较器更改为StringComparer.Ordinal (例如,在上面的示例中将fail更改为false ),问题消失,这似乎指向比较问题,但我不太明白为什么。

UPDATED As noted by Sébastien the issue described here does not show up on .Net 3.5 and 3.5 client profiles. 更新如Sébastien所述,此处描述的问题未显示在.Net 3.5和3.5客户端配置文件上。 It does on .Net 4.0, 4.0 client and 4.5. 它适用于.Net 4.0,4.0客户端和4.5。

After some more digging, I noticed that if I take the sorted keys from the list and I run Array.BinarySearch on those keys, it also returns negative (not found) values for the same keys that are not found using SortedList.ContainsKey . 经过一些挖掘,我注意到如果我从列表中取出排序的键并在这些键上运行Array.BinarySearch ,它也会返回使用SortedList.ContainsKey找不到的相同键的负值(未找到)。 Thus this would suggest that the sort order of the keys is not correct. 因此,这表明密钥的排序顺序不正确。

If I take the already sorted keys from the list and sort them using Array.Sort the sort order of the output is different for the keys that were problematic. 如果我从列表中获取已排序的键并使用Array.Sort对它们进行排序,则输出的排序顺序对于有问题的键是不同的。

So I added a function to check (using the list's comparer) if the sort order of a given array is correct (ie a preceding key is always smaller, a succeeding key is always larger) and restricted the input to words that are distinct according to the comparer. 所以我添加了一个函数来检查(使用列表的比较器),如果给定数组的排序顺序正确(即前一个键总是更小,后续键总是更大)并将输入限制为根据不同的单词。比较器。 I applied this function on 3 different inputs (all using the same comparer): 我在3个不同的输入上应用了这个功能(全部使用相同的比较器):

  1. the SortedList's Keys collection. SortedList的Keys集合。
  2. the output of Array.Sort on these keys. 这些键上的Array.Sort输出。
  3. the output of Array.Sort on the input from the file. Array.Sort的输出来自文件的输入。

The output of (2) and (3) is identical and different from (1). (2)和(3)的输出与(1)相同且不同。 However performing Array.BinarySearch on the Array.Sort outputs of (2) and (3) again fails to find the same keys (returning < 0). 但是,在(2)和(3)的Array.Sort输出上执行Array.BinarySearch再次无法找到相同的键(返回<0)。 Also the function that checks for correct sort order indicates that for all 3 cases, the sort order around the involved problematic keys is not correct. 此外,检查正确排序顺序的函数表明,对于所有3种情况,涉及的有问题键的排序顺序不正确。

At this point I am just hoping I did something incredibly stupid and there is a simple explanation. 在这一点上,我只是希望我做了一些非常愚蠢的事情并且有一个简单的解释。 Hopefully someone can point that out to me. 希望有人能指出这一点。

The example code is updated with my additional troubleshooting experiments, a screenshot of the output can be found here http://imgur.com/DU8SCsA . 示例代码通过我的其他故障排除实验进行更新,输出的屏幕截图可以在http://imgur.com/DU8SCsA找到。

UPDATE 2 Ok, I have narrowed the problem down to what seems to me like a very serious problem with c# string comparers introduced as of .Net 4.0. 更新2好的,我把这个问题缩小到了我认为是一个非常严重的问题,因为.Net 4.0引入了c#字符串比较器。

In summary, suppose we have 3 values: a1, a2 and a3. 总之,假设我们有3个值:a1,a2和a3。 For any kind of sorting to work correctly, we would expect that if a1 < a2 and a2 < a3 that in order for comparison to be consistent, as a consequence the following also holds: a1 < a3 . 为了使任何类型的排序正常工作,我们希望如果a1 < a2a2 < a3为了进行比较是一致的,那么下面的结果也是如此: a1 < a3

This is however not the case with the c# string comparers (at least for Comparer<string>.Default and StringComparer.InvariantCulture ). 然而,c#字符串比较器不是这种情况(至少对于Comparer<string>.DefaultStringComparer.InvariantCulture )。

The little program below illustrates this exact problem: 下面的小程序说明了这个确切的问题:

class Program
{
    static void Main(string[] args)
    {
        var comparer = StringComparer.InvariantCulture;
        var a1 = "A";
        var a2 = "a\'";
        var a3 = "\'a";
        PrintComparison("a1", a1, "a2", a2, comparer);
        PrintComparison("a2", a2, "a3", a3, comparer);
        PrintComparison("a1", a1, "a3", a3, comparer);
        Console.ReadLine();
    }
    public static void PrintComparison(string firstSymbol, string first, string secondSymbol, string second, IComparer<string> comparer)
    {
        var cmp = comparer.Compare(first, second);
        var result = cmp == 0 ? "=" : cmp < 0 ? "<" : ">";
        Console.WriteLine("{0} {1} {2}   ({3} {1} {4})", firstSymbol, result, secondSymbol, first, second);
    }
}

This is its output: 这是它的输出:

a1 < a2   (A < a')
a2 < a3   (a' < 'a)
a1 > a3   (A > 'a)

The conclusion would seem to be that it is not safe to rely on sort order determined using c# string comperators, or am I missing something? 结论似乎是依赖使用c#字符串编译器确定的排序顺序是不安全的,还是我错过了什么?

UPDATE 3 This issue seems to have been reported to MS in December 2012, and is closed with status "won't be fixed", which is rather disappointing; 更新3此问题似乎已于2012年12月向MS报告,并以“不会修复”状态结束,这是相当令人失望的; refer to link posted in comments below (it appears I can't post in here due to my limited reputation points). 请参阅以下评论中发布的链接(由于我的声誉点数有限,我似乎无法在此发布)。 This also lists a workaround, that I have implemented and used to verify that this indeed resolves the problems observed with the standard comparers. 这也列出了一个解决方法,我已经实现并用于验证这确实解决了标准比较器观察到的问题。

public class WorkAroundStringComparer : StringComparer
{
    private static readonly Func<CompareInfo, string, CompareOptions, int> _getHashCodeOfString;
    private readonly CompareInfo _compareInfo;
    private readonly CompareOptions _compareOptions;

    static WorkAroundStringComparer()
    {
        // Need this internal method to compute hashcode
        // as an IEqualityComparer implementation.
        _getHashCodeOfString = BuildGetHashCodeOfStringDelegate();
    }

    static Func<CompareInfo, string, CompareOptions, int> BuildGetHashCodeOfStringDelegate()
    {
        var compareInfoType = typeof(CompareInfo);
        var argTypes = new[] { typeof(string), typeof(CompareOptions) };
        var flags = BindingFlags.NonPublic | BindingFlags.Instance;
        var methods = compareInfoType.GetMethods(flags).ToArray(); ;
        var method = compareInfoType.GetMethod("GetHashCodeOfString", flags, null, argTypes, null);

        var instance = Expression.Parameter(compareInfoType, "instance");
        var stringArg = Expression.Parameter(typeof(string), "string");
        var optionsArg = Expression.Parameter(typeof(CompareOptions), "options");
        var methodCall = Expression.Call(instance, method, stringArg, optionsArg);
        var expr = Expression.Lambda<Func<CompareInfo, string, CompareOptions, int>>(methodCall, instance, stringArg, optionsArg);
        return expr.Compile();
    }

    public WorkAroundStringComparer()
        : this(CultureInfo.InvariantCulture)
    {
    }

    public WorkAroundStringComparer(CultureInfo cultureInfo, CompareOptions compareOptions = CompareOptions.None)
    {
        if (cultureInfo == null)
            throw new ArgumentNullException("cultureInfo");
        this._compareInfo = cultureInfo.CompareInfo;
        this._compareOptions = compareOptions;
    }

    public override int Compare(string x, string y)
    {
        if (ReferenceEquals(x, y))
            return 0;
        if (ReferenceEquals(x, null))
            return -1;
        if (ReferenceEquals(y, null))
            return 1;

        var sortKeyFor_x = _compareInfo.GetSortKey(x, _compareOptions);
        var sortKeyFor_y = _compareInfo.GetSortKey(y, _compareOptions);
        return SortKey.Compare(sortKeyFor_x, sortKeyFor_y);
    }

    public override bool Equals(string x, string y)
    {
        return Compare(x, y) == 0;
    }

    public override int GetHashCode(string obj)
    {
        return _getHashCodeOfString(_compareInfo, obj, _compareOptions);
    }
}

The problem with this workaround is that it is hardly practical for sizable collections, because it is an order of magnitude slower than eg StringComparer.InvariantCulture . 这种解决方法的问题在于它对于大型集合来说几乎不实用,因为它比例如StringComparer.InvariantCulture慢一个数量级。

Time taken when sorting the given word list 1000 times using both comparers: 使用两个比较器对给定单词列表进行1000次排序所花费的时间:

StringComparer.InvariantCulture : 00:00:15.3120013
WorkAroundStringComparer        : 00:01:35.8322409

So I am still hoping that either Microsoft reconsiders or someone knows a viable alternative. 所以我仍然希望微软重新考虑或者有人知道一个可行的替代方案。 Otherwise the only option that remains is falling back on using StringComparer.Ordinal . 否则,剩下的唯一选择是使用StringComparer.Ordinal

Could it be related to .Net Framework 4/4.5? 它可能与.Net Framework 4 / 4.5有关吗? I have adapted your example for .Net 3.5 like this: 我已经调整了你的.Net 3.5示例:

var words = ReadFile("hamlet.txt");

//...

private static string[] ReadFile(string path)
{
    List<string> lines = new List<string>();
    using (StreamReader sr = new StreamReader(path))
    {
        string text = sr.ReadToEnd();
        lines.Add(text);
    }

    return lines.SelectMany(l => l.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries).Select(w => w.Trim()))
        .Where(w => !(w.ToCharArray().All(c => c == ' ')))
        .ToArray();
}

And both comparers work fine on XP using .Net 3.5. 两个比较器在使用.Net 3.5的XP上运行良好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM