简体   繁体   English

C#中的自然排序

[英]Natural Sort Order in C#

Anyone have a good resource or provide a sample of a natural order sort in C# for an FileInfo array?任何人都有很好的资源或在 C# 中为FileInfo数组提供自然顺序排序的示例? I am implementing the IComparer interface in my sorts.我正在实现IComparer接口。

The easiest thing to do is just P/Invoke the built-in function in Windows, and use it as the comparison function in your IComparer :最简单的方法是 P/Invoke Windows 中的内置函数,并将其用作IComparer的比较函数:

[DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
private static extern int StrCmpLogicalW(string psz1, string psz2);

Michael Kaplan has some examples of how this function works here , and the changes that were made for Vista to make it work more intuitively. Michael Kaplan 提供了一些有关此功能如何工作的示例,以及为使 Vista 更直观地工作而对 Vista 所做的更改。 The plus side of this function is that it will have the same behaviour as the version of Windows it runs on, however this does mean that it differs between versions of Windows so you need to consider whether this is a problem for you.此功能的优点是它的行为与其运行的 Windows 版本相同,但这确实意味着它在 Windows 版本之间有所不同,因此您需要考虑这对您来说是否有问题。

So a complete implementation would be something like:所以一个完整的实现应该是这样的:

[SuppressUnmanagedCodeSecurity]
internal static class SafeNativeMethods
{
    [DllImport("shlwapi.dll", CharSet = CharSet.Unicode)]
    public static extern int StrCmpLogicalW(string psz1, string psz2);
}

public sealed class NaturalStringComparer : IComparer<string>
{
    public int Compare(string a, string b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a, b);
    }
}

public sealed class NaturalFileInfoNameComparer : IComparer<FileInfo>
{
    public int Compare(FileInfo a, FileInfo b)
    {
        return SafeNativeMethods.StrCmpLogicalW(a.Name, b.Name);
    }
}

Just thought I'd add to this (with the most concise solution I could find):只是想我会添加到这个(用我能找到的最简洁的解决方案):

public static IOrderedEnumerable<T> OrderByAlphaNumeric<T>(this IEnumerable<T> source, Func<T, string> selector)
{
    int max = source
        .SelectMany(i => Regex.Matches(selector(i), @"\d+").Cast<Match>().Select(m => (int?)m.Value.Length))
        .Max() ?? 0;

    return source.OrderBy(i => Regex.Replace(selector(i), @"\d+", m => m.Value.PadLeft(max, '0')));
}

The above pads any numbers in the string to the max length of all numbers in all strings and uses the resulting string to sort.以上将字符串中的任何数字填充到所有字符串中所有数字的最大长度,并使用结果字符串进行排序。

The cast to ( int? ) is to allow for collections of strings without any numbers ( .Max() on an empty enumerable throws an InvalidOperationException ).转换为 ( int? ) 是为了允许不带任何数字的字符串集合.Max()空枚举上的.Max()会抛出InvalidOperationException )。

None of the existing implementations looked great so I wrote my own.现有的实现看起来都不是很好,所以我自己写了。 The results are almost identical to the sorting used by modern versions of Windows Explorer (Windows 7/8).结果几乎与现代版本的 Windows 资源管理器 (Windows 7/8) 使用的排序相同。 The only differences I've seen are 1) although Windows used to (eg XP) handle numbers of any length, it's now limited to 19 digits - mine is unlimited, 2) Windows gives inconsistent results with certain sets of Unicode digits - mine works fine (although it doesn't numerically compare digits from surrogate pairs; nor does Windows), and 3) mine can't distinguish different types of non-primary sort weights if they occur in different sections (eg "e-1é" vs "é1e-" - the sections before and after the number have diacritic and punctuation weight differences).我看到的唯一区别是 1) 尽管 Windows 过去 (例如 XP) 处理任何长度的数字,但现在仅限于 19 位数字 - 我的是无限的,2) Windows 对某些 Unicode 数字集给出不一致的结果 - 我的作品很好(虽然它不会从数字上比较来自代理对的数字;Windows 也不会),并且 3)我无法区分不同类型的非主要排序权重,如果它们出现在不同的部分(例如“e-1é”与“ é1e-” - 数字前后的部分有变音符号和标点符号权重差异)。

public static int CompareNatural(string strA, string strB) {
    return CompareNatural(strA, strB, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase);
}

public static int CompareNatural(string strA, string strB, CultureInfo culture, CompareOptions options) {
    CompareInfo cmp = culture.CompareInfo;
    int iA = 0;
    int iB = 0;
    int softResult = 0;
    int softResultWeight = 0;
    while (iA < strA.Length && iB < strB.Length) {
        bool isDigitA = Char.IsDigit(strA[iA]);
        bool isDigitB = Char.IsDigit(strB[iB]);
        if (isDigitA != isDigitB) {
            return cmp.Compare(strA, iA, strB, iB, options);
        }
        else if (!isDigitA && !isDigitB) {
            int jA = iA + 1;
            int jB = iB + 1;
            while (jA < strA.Length && !Char.IsDigit(strA[jA])) jA++;
            while (jB < strB.Length && !Char.IsDigit(strB[jB])) jB++;
            int cmpResult = cmp.Compare(strA, iA, jA - iA, strB, iB, jB - iB, options);
            if (cmpResult != 0) {
                // Certain strings may be considered different due to "soft" differences that are
                // ignored if more significant differences follow, e.g. a hyphen only affects the
                // comparison if no other differences follow
                string sectionA = strA.Substring(iA, jA - iA);
                string sectionB = strB.Substring(iB, jB - iB);
                if (cmp.Compare(sectionA + "1", sectionB + "2", options) ==
                    cmp.Compare(sectionA + "2", sectionB + "1", options))
                {
                    return cmp.Compare(strA, iA, strB, iB, options);
                }
                else if (softResultWeight < 1) {
                    softResult = cmpResult;
                    softResultWeight = 1;
                }
            }
            iA = jA;
            iB = jB;
        }
        else {
            char zeroA = (char)(strA[iA] - (int)Char.GetNumericValue(strA[iA]));
            char zeroB = (char)(strB[iB] - (int)Char.GetNumericValue(strB[iB]));
            int jA = iA;
            int jB = iB;
            while (jA < strA.Length && strA[jA] == zeroA) jA++;
            while (jB < strB.Length && strB[jB] == zeroB) jB++;
            int resultIfSameLength = 0;
            do {
                isDigitA = jA < strA.Length && Char.IsDigit(strA[jA]);
                isDigitB = jB < strB.Length && Char.IsDigit(strB[jB]);
                int numA = isDigitA ? (int)Char.GetNumericValue(strA[jA]) : 0;
                int numB = isDigitB ? (int)Char.GetNumericValue(strB[jB]) : 0;
                if (isDigitA && (char)(strA[jA] - numA) != zeroA) isDigitA = false;
                if (isDigitB && (char)(strB[jB] - numB) != zeroB) isDigitB = false;
                if (isDigitA && isDigitB) {
                    if (numA != numB && resultIfSameLength == 0) {
                        resultIfSameLength = numA < numB ? -1 : 1;
                    }
                    jA++;
                    jB++;
                }
            }
            while (isDigitA && isDigitB);
            if (isDigitA != isDigitB) {
                // One number has more digits than the other (ignoring leading zeros) - the longer
                // number must be larger
                return isDigitA ? 1 : -1;
            }
            else if (resultIfSameLength != 0) {
                // Both numbers are the same length (ignoring leading zeros) and at least one of
                // the digits differed - the first difference determines the result
                return resultIfSameLength;
            }
            int lA = jA - iA;
            int lB = jB - iB;
            if (lA != lB) {
                // Both numbers are equivalent but one has more leading zeros
                return lA > lB ? -1 : 1;
            }
            else if (zeroA != zeroB && softResultWeight < 2) {
                softResult = cmp.Compare(strA, iA, 1, strB, iB, 1, options);
                softResultWeight = 2;
            }
            iA = jA;
            iB = jB;
        }
    }
    if (iA < strA.Length || iB < strB.Length) {
        return iA < strA.Length ? 1 : -1;
    }
    else if (softResult != 0) {
        return softResult;
    }
    return 0;
}

The signature matches the Comparison<string> delegate:签名与Comparison<string>委托相匹配:

string[] files = Directory.GetFiles(@"C:\");
Array.Sort(files, CompareNatural);

Here's a wrapper class for use as IComparer<string> :这是用作IComparer<string>的包装类:

public class CustomComparer<T> : IComparer<T> {
    private Comparison<T> _comparison;

    public CustomComparer(Comparison<T> comparison) {
        _comparison = comparison;
    }

    public int Compare(T x, T y) {
        return _comparison(x, y);
    }
}

Example:例子:

string[] files = Directory.EnumerateFiles(@"C:\")
    .OrderBy(f => f, new CustomComparer<string>(CompareNatural))
    .ToArray();

Here's a good set of filenames I use for testing:这是我用于测试的一组很好的文件名:

Func<string, string> expand = (s) => { int o; while ((o = s.IndexOf('\\')) != -1) { int p = o + 1;
    int z = 1; while (s[p] == '0') { z++; p++; } int c = Int32.Parse(s.Substring(p, z));
    s = s.Substring(0, o) + new string(s[o - 1], c) + s.Substring(p + z); } return s; };
string encodedFileNames =
    "KDEqLW4xMiotbjEzKjAwMDFcMDY2KjAwMlwwMTcqMDA5XDAxNyowMlwwMTcqMDlcMDE3KjEhKjEtISox" +
    "LWEqMS4yNT8xLjI1KjEuNT8xLjUqMSoxXDAxNyoxXDAxOCoxXDAxOSoxXDA2NioxXDA2NyoxYSoyXDAx" +
    "NyoyXDAxOCo5XDAxNyo5XDAxOCo5XDA2Nio9MSphMDAxdGVzdDAxKmEwMDF0ZXN0aW5nYTBcMzEqYTAw" +
    "Mj9hMDAyIGE/YTAwMiBhKmEwMDIqYTAwMmE/YTAwMmEqYTAxdGVzdGluZ2EwMDEqYTAxdnNmcyphMSph" +
    "MWEqYTF6KmEyKmIwMDAzcTYqYjAwM3E0KmIwM3E1KmMtZSpjZCpjZipmIDEqZipnP2cgMT9oLW4qaG8t" +
    "bipJKmljZS1jcmVhbT9pY2VjcmVhbT9pY2VjcmVhbS0/ajBcNDE/ajAwMWE/ajAxP2shKmsnKmstKmsx" +
    "KmthKmxpc3QqbTAwMDNhMDA1YSptMDAzYTAwMDVhKm0wMDNhMDA1Km0wMDNhMDA1YSpuMTIqbjEzKm8t" +
    "bjAxMypvLW4xMipvLW40P28tbjQhP28tbjR6P28tbjlhLWI1Km8tbjlhYjUqb24wMTMqb24xMipvbjQ/" +
    "b240IT9vbjR6P29uOWEtYjUqb245YWI1Km/CrW4wMTMqb8KtbjEyKnAwMCpwMDEqcDAxwr0hKnAwMcK9" +
    "KnAwMcK9YSpwMDHCvcK+KnAwMipwMMK9KnEtbjAxMypxLW4xMipxbjAxMypxbjEyKnItMDAhKnItMDAh" +
    "NSpyLTAwIe+8lSpyLTAwYSpyLe+8kFwxIS01KnIt77yQXDEhLe+8lSpyLe+8kFwxISpyLe+8kFwxITUq" +
    "ci3vvJBcMSHvvJUqci3vvJBcMWEqci3vvJBcMyE1KnIwMCEqcjAwLTUqcjAwLjUqcjAwNSpyMDBhKnIw" +
    "NSpyMDYqcjQqcjUqctmg2aYqctmkKnLZpSpy27Dbtipy27Qqctu1KnLfgN+GKnLfhCpy34UqcuClpuCl" +
    "rCpy4KWqKnLgpasqcuCnpuCnrCpy4KeqKnLgp6sqcuCppuCprCpy4KmqKnLgqasqcuCrpuCrrCpy4Kuq" +
    "KnLgq6sqcuCtpuCtrCpy4K2qKnLgrasqcuCvpuCvrCpy4K+qKnLgr6sqcuCxpuCxrCpy4LGqKnLgsasq" +
    "cuCzpuCzrCpy4LOqKnLgs6sqcuC1puC1rCpy4LWqKnLgtasqcuC5kOC5lipy4LmUKnLguZUqcuC7kOC7" +
    "lipy4LuUKnLgu5UqcuC8oOC8pipy4LykKnLgvKUqcuGBgOGBhipy4YGEKnLhgYUqcuGCkOGClipy4YKU" +
    "KnLhgpUqcuGfoOGfpipy4Z+kKnLhn6UqcuGgkOGglipy4aCUKnLhoJUqcuGlhuGljCpy4aWKKnLhpYsq" +
    "cuGnkOGnlipy4aeUKnLhp5UqcuGtkOGtlipy4a2UKnLhrZUqcuGusOGutipy4a60KnLhrrUqcuGxgOGx" +
    "hipy4bGEKnLhsYUqcuGxkOGxlipy4bGUKnLhsZUqcuqYoFwx6pilKnLqmKDqmKUqcuqYoOqYpipy6pik" +
    "KnLqmKUqcuqjkOqjlipy6qOUKnLqo5UqcuqkgOqkhipy6qSEKnLqpIUqcuqpkOqplipy6qmUKnLqqZUq" +
    "cvCQkqAqcvCQkqUqcvCdn5gqcvCdn50qcu+8kFwxISpy77yQXDEt77yVKnLvvJBcMS7vvJUqcu+8kFwx" +
    "YSpy77yQXDHqmKUqcu+8kFwx77yO77yVKnLvvJBcMe+8lSpy77yQ77yVKnLvvJDvvJYqcu+8lCpy77yV" +
    "KnNpKnPEsSp0ZXN02aIqdGVzdNmi2aAqdGVzdNmjKnVBZS0qdWFlKnViZS0qdUJlKnVjZS0xw6kqdWNl" +
    "McOpLSp1Y2Uxw6kqdWPDqS0xZSp1Y8OpMWUtKnVjw6kxZSp3ZWlhMSp3ZWlhMip3ZWlzczEqd2Vpc3My" +
    "KndlaXoxKndlaXoyKndlacOfMSp3ZWnDnzIqeSBhMyp5IGE0KnknYTMqeSdhNCp5K2EzKnkrYTQqeS1h" +
    "Myp5LWE0KnlhMyp5YTQqej96IDA1MD96IDIxP3ohMjE/ejIwP3oyMj96YTIxP3rCqTIxP1sxKl8xKsKt" +
    "bjEyKsKtbjEzKsSwKg==";
string[] fileNames = Encoding.UTF8.GetString(Convert.FromBase64String(encodedFileNames))
    .Replace("*", ".txt?").Split(new[] { "?" }, StringSplitOptions.RemoveEmptyEntries)
    .Select(n => expand(n)).ToArray();

Pure C# solution for linq orderby: linq orderby 的纯 C# 解决方案:

http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html http://zootfroot.blogspot.com/2009/09/natural-sort-compare-with-linq-orderby.html

public class NaturalSortComparer<T> : IComparer<string>, IDisposable
{
    private bool isAscending;

    public NaturalSortComparer(bool inAscendingOrder = true)
    {
        this.isAscending = inAscendingOrder;
    }

    #region IComparer<string> Members

    public int Compare(string x, string y)
    {
        throw new NotImplementedException();
    }

    #endregion

    #region IComparer<string> Members

    int IComparer<string>.Compare(string x, string y)
    {
        if (x == y)
            return 0;

        string[] x1, y1;

        if (!table.TryGetValue(x, out x1))
        {
            x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
            table.Add(x, x1);
        }

        if (!table.TryGetValue(y, out y1))
        {
            y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
            table.Add(y, y1);
        }

        int returnVal;

        for (int i = 0; i < x1.Length && i < y1.Length; i++)
        {
            if (x1[i] != y1[i])
            {
                returnVal = PartCompare(x1[i], y1[i]);
                return isAscending ? returnVal : -returnVal;
            }
        }

        if (y1.Length > x1.Length)
        {
            returnVal = 1;
        }
        else if (x1.Length > y1.Length)
        { 
            returnVal = -1; 
        }
        else
        {
            returnVal = 0;
        }

        return isAscending ? returnVal : -returnVal;
    }

    private static int PartCompare(string left, string right)
    {
        int x, y;
        if (!int.TryParse(left, out x))
            return left.CompareTo(right);

        if (!int.TryParse(right, out y))
            return left.CompareTo(right);

        return x.CompareTo(y);
    }

    #endregion

    private Dictionary<string, string[]> table = new Dictionary<string, string[]>();

    public void Dispose()
    {
        table.Clear();
        table = null;
    }
}

Matthews Horsleys answer is the fastest method which doesn't change behaviour depending on which version of windows your program is running on. Matthews Horsleys 的答案是最快的方法,它不会根据您的程序在哪个版本的 Windows 上运行而改变行为。 However, it can be even faster by creating the regex once, and using RegexOptions.Compiled.但是,通过创建一次正则表达式并使用 RegexOptions.Compiled 可以更快。 I also added the option of inserting a string comparer so you can ignore case if needed, and improved readability a bit.我还添加了插入字符串比较器的选项,以便您可以在需要时忽略大小写,并稍微提高了可读性。

    public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> items, Func<T, string> selector, StringComparer stringComparer = null)
    {
        var regex = new Regex(@"\d+", RegexOptions.Compiled);

        int maxDigits = items
                      .SelectMany(i => regex.Matches(selector(i)).Cast<Match>().Select(digitChunk => (int?)digitChunk.Value.Length))
                      .Max() ?? 0;

        return items.OrderBy(i => regex.Replace(selector(i), match => match.Value.PadLeft(maxDigits, '0')), stringComparer ?? StringComparer.CurrentCulture);
    }

Use by使用人

var sortedEmployees = employees.OrderByNatural(emp => emp.Name);

This takes 450ms to sort 100,000 strings compared to 300ms for the default .net string comparison - pretty fast!与默认的 .net 字符串比较的 300 毫秒相比,这需要 450 毫秒来对 100,000 个字符串进行排序 - 非常快!

My solution:我的解决方案:

void Main()
{
    new[] {"a4","a3","a2","a10","b5","b4","b400","1","C1d","c1d2"}.OrderBy(x => x, new NaturalStringComparer()).Dump();
}

public class NaturalStringComparer : IComparer<string>
{
    private static readonly Regex _re = new Regex(@"(?<=\D)(?=\d)|(?<=\d)(?=\D)", RegexOptions.Compiled);

    public int Compare(string x, string y)
    {
        x = x.ToLower();
        y = y.ToLower();
        if(string.Compare(x, 0, y, 0, Math.Min(x.Length, y.Length)) == 0)
        {
            if(x.Length == y.Length) return 0;
            return x.Length < y.Length ? -1 : 1;
        }
        var a = _re.Split(x);
        var b = _re.Split(y);
        int i = 0;
        while(true)
        {
            int r = PartCompare(a[i], b[i]);
            if(r != 0) return r;
            ++i;
        }
    }

    private static int PartCompare(string x, string y)
    {
        int a, b;
        if(int.TryParse(x, out a) && int.TryParse(y, out b))
            return a.CompareTo(b);
        return x.CompareTo(y);
    }
}

Results:结果:

1
a2
a3
a4
a10
b4
b5
b400
C1d
c1d2

You do need to be careful -- I vaguely recall reading that StrCmpLogicalW, or something like it, was not strictly transitive, and I have observed .NET's sort methods to sometimes get stuck in infinite loops if the comparison function breaks that rule.您确实需要小心——我依稀记得读过 StrCmpLogicalW 或类似的东西,不是严格可传递的,而且我观察到 .NET 的排序方法有时会在比较函数违反该规则时陷入无限循环。

A transitive comparison will always report that a < c if a < b and b < c.如果 a < b 和 b < c,传递比较将始终报告 a < c。 There exists a function that does a natural sort order comparison that does not always meet that criterion, but I can't recall whether it is StrCmpLogicalW or something else.存在一个执行自然排序顺序比较的函数,它并不总是满足该标准,但我不记得它是 StrCmpLogicalW 还是其他什么。

This is my code to sort a string having both alpha and numeric characters.这是我对具有字母和数字字符的字符串进行排序的代码。

First, this extension method:一、这个扩展方法:

public static IEnumerable<string> AlphanumericSort(this IEnumerable<string> me)
{
    return me.OrderBy(x => Regex.Replace(x, @"\d+", m => m.Value.PadLeft(50, '0')));
}

Then, simply use it anywhere in your code like this:然后,只需在代码中的任何位置使用它,如下所示:

List<string> test = new List<string>() { "The 1st", "The 12th", "The 2nd" };
test = test.AlphanumericSort();

How does it works ?它是如何工作的? By replaceing with zeros:通过用零替换:

  Original  | Regex Replace |      The      |   Returned
    List    | Apply PadLeft |    Sorting    |     List
            |               |               |
 "The 1st"  |  "The 001st"  |  "The 001st"  |  "The 1st"
 "The 12th" |  "The 012th"  |  "The 002nd"  |  "The 2nd"
 "The 2nd"  |  "The 002nd"  |  "The 012th"  |  "The 12th"

Works with multiples numbers:适用于倍数:

 Alphabetical Sorting | Alphanumeric Sorting
                      |
 "Page 21, Line 42"   | "Page 3, Line 7"
 "Page 21, Line 5"    | "Page 3, Line 32"
 "Page 3, Line 32"    | "Page 21, Line 5"
 "Page 3, Line 7"     | "Page 21, Line 42"

Hope that's will help.希望这会有所帮助。

Adding to Greg Beech's answer (because I've just been searching for that), if you want to use this from Linq you can use the OrderBy that takes an IComparer .添加到Greg Beech 的答案(因为我一直在寻找它),如果您想从 Linq 使用它,您可以使用带有IComparerOrderBy Eg:例如:

var items = new List<MyItem>();

// fill items

var sorted = items.OrderBy(item => item.Name, new NaturalStringComparer());

Here's a relatively simple example that doesn't use P/Invoke and avoids any allocation during execution.这是一个相对简单的示例,它不使用 P/Invoke 并避免在执行期间进行任何分配。

Feel free to use the code from here, or if it's easier there's a NuGet package:随意使用这里的代码,或者如果它更容易有一个 NuGet 包:

https://www.nuget.org/packages/NaturalSort https://www.nuget.org/packages/NaturalSort

https://github.com/drewnoakes/natural-sort https://github.com/drewnoakes/natural-sort

internal sealed class NaturalStringComparer : IComparer<string>
{
    public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();

    public int Compare(string x, string y)
    {
        // sort nulls to the start
        if (x == null)
            return y == null ? 0 : -1;
        if (y == null)
            return 1;

        var ix = 0;
        var iy = 0;

        while (true)
        {
            // sort shorter strings to the start
            if (ix >= x.Length)
                return iy >= y.Length ? 0 : -1;
            if (iy >= y.Length)
                return 1;

            var cx = x[ix];
            var cy = y[iy];

            int result;
            if (char.IsDigit(cx) && char.IsDigit(cy))
                result = CompareInteger(x, y, ref ix, ref iy);
            else
                result = cx.CompareTo(y[iy]);

            if (result != 0)
                return result;

            ix++;
            iy++;
        }
    }

    private static int CompareInteger(string x, string y, ref int ix, ref int iy)
    {
        var lx = GetNumLength(x, ix);
        var ly = GetNumLength(y, iy);

        // shorter number first (note, doesn't handle leading zeroes)
        if (lx != ly)
            return lx.CompareTo(ly);

        for (var i = 0; i < lx; i++)
        {
            var result = x[ix++].CompareTo(y[iy++]);
            if (result != 0)
                return result;
        }

        return 0;
    }

    private static int GetNumLength(string s, int i)
    {
        var length = 0;
        while (i < s.Length && char.IsDigit(s[i++]))
            length++;
        return length;
    }
}

It doesn't ignore leading zeroes, so 01 comes after 2 .它不会忽略前导零,因此012之后。

Corresponding unit test:对应的单元测试:

public class NumericStringComparerTests
{
    [Fact]
    public void OrdersCorrectly()
    {
        AssertEqual("", "");
        AssertEqual(null, null);
        AssertEqual("Hello", "Hello");
        AssertEqual("Hello123", "Hello123");
        AssertEqual("123", "123");
        AssertEqual("123Hello", "123Hello");

        AssertOrdered("", "Hello");
        AssertOrdered(null, "Hello");
        AssertOrdered("Hello", "Hello1");
        AssertOrdered("Hello123", "Hello124");
        AssertOrdered("Hello123", "Hello133");
        AssertOrdered("Hello123", "Hello223");
        AssertOrdered("123", "124");
        AssertOrdered("123", "133");
        AssertOrdered("123", "223");
        AssertOrdered("123", "1234");
        AssertOrdered("123", "2345");
        AssertOrdered("0", "1");
        AssertOrdered("123Hello", "124Hello");
        AssertOrdered("123Hello", "133Hello");
        AssertOrdered("123Hello", "223Hello");
        AssertOrdered("123Hello", "1234Hello");
    }

    private static void AssertEqual(string x, string y)
    {
        Assert.Equal(0, NaturalStringComparer.Instance.Compare(x, y));
        Assert.Equal(0, NaturalStringComparer.Instance.Compare(y, x));
    }

    private static void AssertOrdered(string x, string y)
    {
        Assert.Equal(-1, NaturalStringComparer.Instance.Compare(x, y));
        Assert.Equal( 1, NaturalStringComparer.Instance.Compare(y, x));
    }
}

I've actually implemented it as an extension method on the StringComparer so that you could do for example:我实际上已经将它实现为StringComparer上的扩展方法,以便您可以执行以下操作:

  • StringComparer.CurrentCulture.WithNaturalSort() or StringComparer.CurrentCulture.WithNaturalSort()
  • StringComparer.OrdinalIgnoreCase.WithNaturalSort() . StringComparer.OrdinalIgnoreCase.WithNaturalSort()

The resulting IComparer<string> can be used in all places like OrderBy , OrderByDescending , ThenBy , ThenByDescending , SortedSet<string> , etc. And you can still easily tweak case sensitivity, culture, etc.生成的IComparer<string>可用于所有地方,如OrderByOrderByDescendingThenByThenByDescendingSortedSet<string>等。您仍然可以轻松调整区分大小写、文化等。

The implementation is fairly trivial and it should perform quite well even on large sequences.实现相当简单,即使在大序列上也应该表现得很好。


I've also published it as a tiny NuGet package , so you can just do:我还将它发布为一个很小的NuGet 包,所以你可以这样做:

Install-Package NaturalSort.Extension

The code including XML documentation comments and suite of tests is available in the NaturalSort.Extension GitHub repository . NaturalSort.Extension GitHub 存储库中提供了包含 XML 文档注释和测试套件的代码。


The entire code is this (if you cannot use C# 7 yet, just install the NuGet package):整个代码是这样的(如果您还不能使用 C# 7,只需安装 NuGet 包):

public static class StringComparerNaturalSortExtension
{
    public static IComparer<string> WithNaturalSort(this StringComparer stringComparer) => new NaturalSortComparer(stringComparer);

    private class NaturalSortComparer : IComparer<string>
    {
        public NaturalSortComparer(StringComparer stringComparer)
        {
            _stringComparer = stringComparer;
        }

        private readonly StringComparer _stringComparer;
        private static readonly Regex NumberSequenceRegex = new Regex(@"(\d+)", RegexOptions.Compiled | RegexOptions.CultureInvariant);
        private static string[] Tokenize(string s) => s == null ? new string[] { } : NumberSequenceRegex.Split(s);
        private static ulong ParseNumberOrZero(string s) => ulong.TryParse(s, NumberStyles.None, CultureInfo.InvariantCulture, out var result) ? result : 0;

        public int Compare(string s1, string s2)
        {
            var tokens1 = Tokenize(s1);
            var tokens2 = Tokenize(s2);

            var zipCompare = tokens1.Zip(tokens2, TokenCompare).FirstOrDefault(x => x != 0);
            if (zipCompare != 0)
                return zipCompare;

            var lengthCompare = tokens1.Length.CompareTo(tokens2.Length);
            return lengthCompare;
        }
        
        private int TokenCompare(string token1, string token2)
        {
            var number1 = ParseNumberOrZero(token1);
            var number2 = ParseNumberOrZero(token2);

            var numberCompare = number1.CompareTo(number2);
            if (numberCompare != 0)
                return numberCompare;

            var stringCompare = _stringComparer.Compare(token1, token2);
            return stringCompare;
        }
    }
}

Here is a naive one-line regex-less LINQ way (borrowed from python):这是一个天真的单行无正则表达式 LINQ 方式(从 python 借来):

var alphaStrings = new List<string>() { "10","2","3","4","50","11","100","a12","b12" };
var orderedString = alphaStrings.OrderBy(g => new Tuple<int, string>(g.ToCharArray().All(char.IsDigit)? int.Parse(g) : int.MaxValue, g));
// Order Now: ["2","3","4","10","11","50","100","a12","b12"]

Inspired by Michael Parker's solution, here is an IComparer implementation that you can drop in to any of the linq ordering methods:受 Michael Parker 解决方案的启发,这里有一个IComparer实现,您可以将其放入任何 linq 排序方法:

private class NaturalStringComparer : IComparer<string>
{
    public int Compare(string left, string right)
    {
        int max = new[] { left, right }
            .SelectMany(x => Regex.Matches(x, @"\d+").Cast<Match>().Select(y => (int?)y.Value.Length))
            .Max() ?? 0;

        var leftPadded = Regex.Replace(left, @"\d+", m => m.Value.PadLeft(max, '0'));
        var rightPadded = Regex.Replace(right, @"\d+", m => m.Value.PadLeft(max, '0'));

        return string.Compare(leftPadded, rightPadded);
    }
}

Here's a version for .NET Core 2.1+ / .NET 5.0+, using spans to avoid allocations这是 .NET Core 2.1+ / .NET 5.0+ 的一个版本,使用跨度来避免分配

public class NaturalSortStringComparer : IComparer<string>
{
    public static NaturalSortStringComparer Ordinal { get; } = new NaturalSortStringComparer(StringComparison.Ordinal);
    public static NaturalSortStringComparer OrdinalIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.OrdinalIgnoreCase);
    public static NaturalSortStringComparer CurrentCulture { get; } = new NaturalSortStringComparer(StringComparison.CurrentCulture);
    public static NaturalSortStringComparer CurrentCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.CurrentCultureIgnoreCase);
    public static NaturalSortStringComparer InvariantCulture { get; } = new NaturalSortStringComparer(StringComparison.InvariantCulture);
    public static NaturalSortStringComparer InvariantCultureIgnoreCase { get; } = new NaturalSortStringComparer(StringComparison.InvariantCultureIgnoreCase);

    private readonly StringComparison _comparison;

    public NaturalSortStringComparer(StringComparison comparison)
    {
        _comparison = comparison;
    }

    public int Compare(string? x, string? y)
    {
        // Let string.Compare handle the case where x or y is null
        if (x is null || y is null)
            return string.Compare(x, y, _comparison);

        var xSegments = GetSegments(x);
        var ySegments = GetSegments(y);

        while (xSegments.MoveNext() && ySegments.MoveNext())
        {
            bool xIsNumber = int.TryParse(xSegments.Current, out var xValue);
            bool yIsNumber = int.TryParse(ySegments.Current, out var yValue);

            int cmp;

            // If they're both numbers, compare the value
            if (xIsNumber && yIsNumber)
            {
                cmp = xValue.CompareTo(yValue);
                if (cmp != 0)
                    return cmp;
            }
            // If x is a number and y is not, x is "lesser than" y
            else if (xIsNumber)
            {
                return -1;
            }
            // If y is a number and x is not, x is "greater than" y
            else if (yIsNumber)
            {
                return 1;
            }

            // OK, neither are number, compare the segments as text
            cmp = xSegments.Current.CompareTo(ySegments.Current, _comparison);
            if (cmp != 0)
                return cmp;
        }

        // At this point, either all segments are equal, or one string is shorter than the other

        // If x is shorter, it's "lesser than" y
        if (x.Length < y.Length)
            return -1;
        // If x is longer, it's "greater than" y
        if (x.Length > y.Length)
            return 1;

        // If they have the same length, they're equal
        return 0;
    }

    private static StringSegmentEnumerator GetSegments(string s) => new StringSegmentEnumerator(s);

    private struct StringSegmentEnumerator
    {
        private readonly string _s;
        private int _start;
        private int _length;
        private int _currentPosition;

        public StringSegmentEnumerator(string s)
        {
            _s = s;
            _start = -1;
            _length = 0;
            _currentPosition = 0;
        }

        public ReadOnlySpan<char> Current => _s.AsSpan(_start, _length);

        public bool MoveNext()
        {
            if (_currentPosition >= _s.Length)
                return false;

            int start = _currentPosition;
            bool isFirstCharDigit = Char.IsDigit(_s[_currentPosition]);

            while (++_currentPosition < _s.Length && Char.IsDigit(_s[_currentPosition]) == isFirstCharDigit)
            {
            }

            _start = start;
            _length = _currentPosition - start;
            return true;
        }
    }
}

Expanding on a couple of the previous answers and making use of extension methods, I came up with the following that doesn't have the caveats of potential multiple enumerable enumeration, or performance issues concerned with using multiple regex objects, or calling regex needlessly, that being said, it does use ToList(), which can negate the benefits in larger collections.扩展前面的几个答案并使用扩展方法,我想出了以下内容,它没有潜在的多个可枚举枚举的警告,或者与使用多个正则表达式对象有关的性能问题,或者不必要地调用正则表达式,即话虽如此,它确实使用了 ToList(),这可以抵消较大集合中的好处。

The selector supports generic typing to allow any delegate to be assigned, the elements in the source collection are mutated by the selector, then converted to strings with ToString().选择器支持泛型类型以允许分配任何委托,源集合中的元素由选择器改变,然后使用 ToString() 转换为字符串。

    private static readonly Regex _NaturalOrderExpr = new Regex(@"\d+", RegexOptions.Compiled);

    public static IEnumerable<TSource> OrderByNatural<TSource, TKey>(
        this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        int max = 0;

        var selection = source.Select(
            o =>
            {
                var v = selector(o);
                var s = v != null ? v.ToString() : String.Empty;

                if (!String.IsNullOrWhiteSpace(s))
                {
                    var mc = _NaturalOrderExpr.Matches(s);

                    if (mc.Count > 0)
                    {
                        max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
                    }
                }

                return new
                {
                    Key = o,
                    Value = s
                };
            }).ToList();

        return
            selection.OrderBy(
                o =>
                String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
                     .Select(o => o.Key);
    }

    public static IEnumerable<TSource> OrderByDescendingNatural<TSource, TKey>(
        this IEnumerable<TSource> source, Func<TSource, TKey> selector)
    {
        int max = 0;

        var selection = source.Select(
            o =>
            {
                var v = selector(o);
                var s = v != null ? v.ToString() : String.Empty;

                if (!String.IsNullOrWhiteSpace(s))
                {
                    var mc = _NaturalOrderExpr.Matches(s);

                    if (mc.Count > 0)
                    {
                        max = Math.Max(max, mc.Cast<Match>().Max(m => m.Value.Length));
                    }
                }

                return new
                {
                    Key = o,
                    Value = s
                };
            }).ToList();

        return
            selection.OrderByDescending(
                o =>
                String.IsNullOrWhiteSpace(o.Value) ? o.Value : _NaturalOrderExpr.Replace(o.Value, m => m.Value.PadLeft(max, '0')))
                     .Select(o => o.Key);
    }

We had a need for a natural sort to deal with text with the following pattern:我们需要一种自然排序来处理具有以下模式的文本:

"Test 1-1-1 something"
"Test 1-2-3 something"
...

For some reason when I first looked on SO, I didn't find this post and implemented our own.出于某种原因,当我第一次查看 SO 时,我没有找到这篇文章并实现了我们自己的。 Compared to some of the solutions presented here, while similar in concept, it could have the benefit of maybe being simpler and easier to understand.与这里介绍的一些解决方案相比,虽然概念相似,但它可能具有更简单、更易于理解的好处。 However, while I did try to look at performance bottlenecks, It is still a much slower implementation than the default OrderBy() .但是,虽然我确实尝试查看性能瓶颈,但它的实现仍然比默认的OrderBy()慢得多。

Here is the extension method I implement:这是我实现的扩展方法:

public static class EnumerableExtensions
{
    // set up the regex parser once and for all
    private static readonly Regex Regex = new Regex(@"\d+|\D+", RegexOptions.Compiled | RegexOptions.Singleline);

    // stateless comparer can be built once
    private static readonly AggregateComparer Comparer = new AggregateComparer();

    public static IEnumerable<T> OrderByNatural<T>(this IEnumerable<T> source, Func<T, string> selector)
    {
        // first extract string from object using selector
        // then extract digit and non-digit groups
        Func<T, IEnumerable<IComparable>> splitter =
            s => Regex.Matches(selector(s))
                      .Cast<Match>()
                      .Select(m => Char.IsDigit(m.Value[0]) ? (IComparable) int.Parse(m.Value) : m.Value);
        return source.OrderBy(splitter, Comparer);
    }

    /// <summary>
    /// This comparer will compare two lists of objects against each other
    /// </summary>
    /// <remarks>Objects in each list are compare to their corresponding elements in the other
    /// list until a difference is found.</remarks>
    private class AggregateComparer : IComparer<IEnumerable<IComparable>>
    {
        public int Compare(IEnumerable<IComparable> x, IEnumerable<IComparable> y)
        {
            return
                x.Zip(y, (a, b) => new {a, b})              // walk both lists
                 .Select(pair => pair.a.CompareTo(pair.b))  // compare each object
                 .FirstOrDefault(result => result != 0);    // until a difference is found
        }
    }
}

The idea is to split the original strings into blocks of digits and non-digits ( "\\d+|\\D+" ).这个想法是将原始字符串分成数字块和非数字块( "\\d+|\\D+" )。 Since this is a potentially expensive task, it is done only once per entry.由于这是一项潜在的昂贵任务,因此每个条目仅执行一次。 We then use a comparer of comparable objects (sorry, I can't find a more proper way to say it).然后我们使用可比较对象的比较器(抱歉,我找不到更合适的表达方式)。 It compares each block to its corresponding block in the other string.它将每个块与另一个字符串中的相应块进行比较。

I would like feedback on how this could be improved and what the major flaws are.我想就如何改进这一点以及主要缺陷是什么提供反馈。 Note that maintainability is important to us at this point and we are not currently using this in extremely large data sets.请注意,此时可维护性对我们很重要,我们目前没有在极大的数据集中使用它。

A version that's easier to read/maintain.一个更容易阅读/维护的版本。

public class NaturalStringComparer : IComparer<string>
{
    public static NaturalStringComparer Instance { get; } = new NaturalStringComparer();

    public int Compare(string x, string y) {
        const int LeftIsSmaller = -1;
        const int RightIsSmaller = 1;
        const int Equal = 0;

        var leftString = x;
        var rightString = y;

        var stringComparer = CultureInfo.CurrentCulture.CompareInfo;

        int rightIndex;
        int leftIndex;

        for (leftIndex = 0, rightIndex = 0;
             leftIndex < leftString.Length && rightIndex < rightString.Length;
             leftIndex++, rightIndex++) {
            var leftChar = leftString[leftIndex];
            var rightChar = rightString[leftIndex];

            var leftIsNumber = char.IsNumber(leftChar);
            var rightIsNumber = char.IsNumber(rightChar);

            if (!leftIsNumber && !rightIsNumber) {
                var result = stringComparer.Compare(leftString, leftIndex, 1, rightString, leftIndex, 1);
                if (result != 0) return result;
            } else if (leftIsNumber && !rightIsNumber) {
                return LeftIsSmaller;
            } else if (!leftIsNumber && rightIsNumber) {
                return RightIsSmaller;
            } else {
                var leftNumberLength = NumberLength(leftString, leftIndex, out var leftNumber);
                var rightNumberLength = NumberLength(rightString, rightIndex, out var rightNumber);

                if (leftNumberLength < rightNumberLength) {
                    return LeftIsSmaller;
                } else if (leftNumberLength > rightNumberLength) {
                    return RightIsSmaller;
                } else {
                    if(leftNumber < rightNumber) {
                        return LeftIsSmaller;
                    } else if(leftNumber > rightNumber) {
                        return RightIsSmaller;
                    }
                }
            }
        }

        if (leftString.Length < rightString.Length) {
            return LeftIsSmaller;
        } else if(leftString.Length > rightString.Length) {
            return RightIsSmaller;
        }

        return Equal;
    }

    public int NumberLength(string str, int offset, out int number) {
        if (string.IsNullOrWhiteSpace(str)) throw new ArgumentNullException(nameof(str));
        if (offset >= str.Length) throw new ArgumentOutOfRangeException(nameof(offset), offset, "Offset must be less than the length of the string.");

        var currentOffset = offset;

        var curChar = str[currentOffset];

        if (!char.IsNumber(curChar))
            throw new ArgumentException($"'{curChar}' is not a number.", nameof(offset));

        int length = 1;

        var numberString = string.Empty;

        for (currentOffset = offset + 1;
            currentOffset < str.Length;
            currentOffset++, length++) {

            curChar = str[currentOffset];
            numberString += curChar;

            if (!char.IsNumber(curChar)) {
                number = int.Parse(numberString);

                return length;
            }
        }

        number = int.Parse(numberString);

        return length;
    }
}

Let me explain my problem and how i was able to solve it.让我解释一下我的问题以及我是如何解决它的。

Problem:- Sort files based on FileName from FileInfo objects which are retrieved from a Directory.问题:- 根据从目录中检索的 FileInfo 对象中的 FileName 对文件进行排序。

Solution:- I selected the file names from FileInfo and trimed the ".png" part of the file name.解决方案:- 我从 FileInfo 中选择了文件名并修剪了文件名的“.png”部分。 Now, just do List.Sort(), which sorts the filenames in Natural sorting order.现在,只需执行 List.Sort(),它按自然排序顺序对文件名进行排序。 Based on my testing i found that having .png messes up sorting order.根据我的测试,我发现 .png 会弄乱排序顺序。 Have a look at the below code看看下面的代码

var imageNameList = new DirectoryInfo(@"C:\Temp\Images").GetFiles("*.png").Select(x =>x.Name.Substring(0, x.Name.Length - 4)).ToList();
imageNameList.Sort();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM