简体   繁体   English

比较两个不同长度和显示差异的阵列

[英]Compare Two Arrays Of Different Lengths and Show Differences

Problem: 问题:
I have two arrays that can possibly be different lengths. 我有两个可能长度不同的数组。 I need to iterate through both arrays and find similarities, additions, and deletions. 我需要迭代两个数组并找到相似之处,添加和删除。

What's the fastest and most efficient way to accomplish this in C#? 在C#中实现这一目标的最快,最有效的方法是什么?

Edit: The arrays are pre-sorted and they can contain anywhere between 50-100 items. 编辑:数组已预先排序,它们可以包含50到100个项目之间的任何位置。 Also, there aren't any constraints on speed and/or memory usage (however, no one likes a memory hog;) 此外,对速度和/或内存使用没有任何限制(但是,没有人喜欢内存耗费;)


For example: 例如:

String[] Foo_Old = {"test1", "test2", "test3"};
String[] Foo_New = {"test1", "test2", "test4", "test5"};

AND

String[] Bar_Old = {"test1", "test2", "test4"};
String[] Bar_New = {"test1", "test3"};

Differences: 区别:

(with respect to the Foo_New array) (关于Foo_New数组)

[Same]    "test1"
[Same]    "test2"
[Removed] "test3"
[Added]   "test4"
[Added]   "test5"

(with respect to the Bar_New array) (关于Bar_New数组)

[Same]    "test1"
[Removed] "test2"
[Removed] "test4"
[Added]   "test3"

You can use Except and Intersect ... 你可以使用ExceptIntersect ......

var Foo_Old = new[] { "test1", "test2", "test3" }; 
var Foo_New = new[] { "test1", "test2", "test4", "test5" };

var diff = Foo_New.Except( Foo_Old );
var inter = Foo_New.Intersect( Foo_Old );
var rem = Foo_Old.Except(Foo_New);

foreach (var s in diff)
{
    Console.WriteLine("Added " + s);
}

foreach (var s in inter)
{
    Console.WriteLine("Same " + s);
}

foreach (var s in rem)
{
    Console.WriteLine("Removed " + s);
}

I went ahead and hand-coded one and use the example in the accepted answer, and the hand-coded one performs a little better. 我继续手动编码,并在接受的答案中使用示例,手动编码的表现稍好一些。 I handled outputting my strings a little differently. 我处理我的字符串的方式略有不同。 Other factors to consider include whether the Except make a sorted copy of the array (since it cannot assume it's sorted) or whether it makes some kind of hash or a linear search (it's actually restricted to IEnumerable - for very large arrays which are already sorted, this could be a problem). 要考虑的其他因素包括:Except是否对数组进行了排序复制(因为它不能假定它已经排序),或者它是否进行某种散列或线性搜索(它实际上仅限于IEnumerable) - 对于已经排序的非常大的数组,这可能是一个问题)。 You could change mine to compare IEnumerable (which is more general) instead of IComparable[]. 你可以改变我的比较IEnumerable(这是更一般的)而不是IComparable []。

static void ArrayCompare(IComparable[] Old, IComparable[] New)
{
    int lpOld = 0;
    int lpNew = 0;
    int OldLength = Old.Length;
    int NewLength = New.Length;
    while (lpOld < OldLength || lpNew < NewLength)
    {
        int compare;

        if (lpOld >= OldLength) compare = 1;
        else if (lpNew >= NewLength) compare = -1;
        else compare = Old[lpOld].CompareTo(New[lpNew]);

        if (compare < 0)
        {
            Debug.WriteLine(string.Format("[Removed] {0}", Old[lpOld].ToString()));
            lpOld++;
        }
        else if (compare > 0)
        {
            Debug.WriteLine(string.Format("[Added] {0}", New[lpNew].ToString()));
            lpNew++;
        }
        else
        {
            Debug.WriteLine(string.Format("[Same] {0}", Old[lpOld].ToString()));
            lpOld++;
            lpNew++;
        }
    }
}

static void ArrayCompare2(IComparable[] Old, IComparable[] New) {
    var diff = New.Except( Old );
    var inter = New.Intersect( Old );
    var rem = Old.Except(New);

    foreach (var s in diff)
    {
        Debug.WriteLine("Added " + s);
    }

    foreach (var s in inter)
    {
        Debug.WriteLine("Same " + s);
    }

    foreach (var s in rem)
    {
        Debug.WriteLine("Removed " + s);
    }
}

static void Main(string[] args)
{
    String[] Foo_Old = {"test1", "test2", "test3"};
    String[] Foo_New = {"test1", "test2", "test4", "test5"};
    String[] Bar_Old = {"test1", "test2", "test4"};
    String[] Bar_New = {"test1", "test3"};

    Stopwatch w1 = new Stopwatch();
    w1.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare(Foo_Old, Foo_New);
        ArrayCompare(Bar_Old, Bar_New);
    }
    w1.Stop();

    Stopwatch w2 = new Stopwatch();
    w2.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare2(Foo_Old, Foo_New);
        ArrayCompare2(Bar_Old, Bar_New);
    }
    w2.Stop();

    Debug.WriteLine(w1.Elapsed.ToString());
    Debug.WriteLine(w2.Elapsed.ToString());
}

Since your arrays are sorted, you should be able to just go through the arrays simultaneously, and in one pass and determine if each element is in the other array. 由于您的数组已排序,您应该能够同时遍历数组,并在一次传递中确定每个元素是否在另一个数组中。 (Similar to the merge step in merge sort.) You can see a sample of that below: (与合并排序中的合并步骤类似。)您可以在下面看到以下示例:

string[] oldVersion = { "test1", "test2", "test3" };
string[] newVersion = { "test1", "test2", "test4", "test5" };

int oldIndex = 0, newIndex = 0;

while ((oldIndex < oldVersion.Length) && (newIndex < newVersion.Length)) {
   int comparison = oldVersion[oldIndex].CompareTo(newVersion[newIndex]);

   if (comparison < 0)
      Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);
   else if (comparison > 0)
      Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);
   else {
      Console.WriteLine("[Same]\t\t" + oldVersion[oldIndex++]);
      newIndex++;
   }
}

while (oldIndex < oldVersion.Length)
   Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);

while (newIndex < newVersion.Length)
   Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);

Alternatively you'd need to go through one array, and for each element in this array, do a single pass of the other array looking for a match. 或者,您需要遍历一个数组,并且对于此数组中的每个元素,执行另一个数组的单个传递以查找匹配项。

Edit: JP has a good suggestion on how to do this using the framework. 编辑:JP有一个很好的建议如何使用框架这样做。 Although, assuming the arrays are sorted, the benefit of my approach is that you only have to do one pass to find all the results. 虽然,假设数组已排序,我的方法的好处是你只需要进行一次传递即可找到所有结果。 You would not have to do three passes. 你不需要做三次传球。

I wrote this a while back: 我写了一会儿:

Usage: 用法:

foreach (var diff in Foo_Old.Diff(Foo_New)){
   Console.WriteLine ("{0} action performed on {1}",diff.DiffAction,diff.Value);
}

Implementation: 执行:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqExtensions {

    enum DiffAction {
       Added,
       Removed,
       Same
    }

    class DiffPair<T> {
        public T Value { get; set; }
        public DiffAction DiffAction { get; set; }
    }

    static class DiffExtension {
        public static IEnumerable<DiffPair<T>> Diff<T>
                 (
                     this IEnumerable<T> original,
                     IEnumerable<T> target 
                 ) {

            Dictionary<T, DiffAction> results = new Dictionary<T, DiffAction>();

            foreach (var item in original) {
                results[item] = DiffAction.Removed;
            }

            foreach (var item in target) {
                if (results.ContainsKey(item)) {
                    results[item] = DiffAction.Same;
                } else {
                    results[item] = DiffAction.Added;
                }
            }
            return results.Select(
                pair => new DiffPair<T> {
                    Value=pair.Key, 
                    DiffAction = pair.Value
                });
        }
    }

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM