简体   繁体   中英

Compare Two Arrays Of Different Lengths and Show Differences


I have two arrays that can possibly be different lengths. I need to iterate through both arrays and find similarities, additions, and deletions.

What's the fastest and most efficient way to accomplish this in C#?

Edit: The arrays are pre-sorted and they can contain anywhere between 50-100 items. Also, there aren't any constraints on speed and/or memory usage (however, no one likes a memory hog;)


String[] Foo_Old = {"test1", "test2", "test3"};
String[] Foo_New = {"test1", "test2", "test4", "test5"};

AND

String[] Bar_Old = {"test1", "test2", "test4"};
String[] Bar_New = {"test1", "test3"};

(with respect to the Foo_New array)

[Same]    "test1"
[Same]    "test2"
[Removed] "test3"
[Added]   "test4"
[Added]   "test5"

(with respect to the Bar_New array)

[Same]    "test1"
[Removed] "test2"
[Removed] "test4"
[Added]   "test3"

You can use Except and Intersect ...

var Foo_Old = new[] { "test1", "test2", "test3" }; 
var Foo_New = new[] { "test1", "test2", "test4", "test5" };

var diff = Foo_New.Except( Foo_Old );
var inter = Foo_New.Intersect( Foo_Old );
var rem = Foo_Old.Except(Foo_New);

foreach (var s in diff)
{
    Console.WriteLine("Added " + s);
}

foreach (var s in inter)
{
    Console.WriteLine("Same " + s);
}

foreach (var s in rem)
{
    Console.WriteLine("Removed " + s);
}

I went ahead and hand-coded one and use the example in the accepted answer, and the hand-coded one performs a little better. I handled outputting my strings a little differently. Other factors to consider include whether the Except make a sorted copy of the array (since it cannot assume it's sorted) or whether it makes some kind of hash or a linear search (it's actually restricted to IEnumerable - for very large arrays which are already sorted, this could be a problem). You could change mine to compare IEnumerable (which is more general) instead of IComparable[].

static void ArrayCompare(IComparable[] Old, IComparable[] New)
{
    int lpOld = 0;
    int lpNew = 0;
    int OldLength = Old.Length;
    int NewLength = New.Length;
    while (lpOld < OldLength || lpNew < NewLength)
    {
        int compare;

        if (lpOld >= OldLength) compare = 1;
        else if (lpNew >= NewLength) compare = -1;
        else compare = Old[lpOld].CompareTo(New[lpNew]);

        if (compare < 0)
        {
            Debug.WriteLine(string.Format("[Removed] {0}", Old[lpOld].ToString()));
            lpOld++;
        }
        else if (compare > 0)
        {
            Debug.WriteLine(string.Format("[Added] {0}", New[lpNew].ToString()));
            lpNew++;
        }
        else
        {
            Debug.WriteLine(string.Format("[Same] {0}", Old[lpOld].ToString()));
            lpOld++;
            lpNew++;
        }
    }
}

static void ArrayCompare2(IComparable[] Old, IComparable[] New) {
    var diff = New.Except( Old );
    var inter = New.Intersect( Old );
    var rem = Old.Except(New);

    foreach (var s in diff)
    {
        Debug.WriteLine("Added " + s);
    }

    foreach (var s in inter)
    {
        Debug.WriteLine("Same " + s);
    }

    foreach (var s in rem)
    {
        Debug.WriteLine("Removed " + s);
    }
}

static void Main(string[] args)
{
    String[] Foo_Old = {"test1", "test2", "test3"};
    String[] Foo_New = {"test1", "test2", "test4", "test5"};
    String[] Bar_Old = {"test1", "test2", "test4"};
    String[] Bar_New = {"test1", "test3"};

    Stopwatch w1 = new Stopwatch();
    w1.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare(Foo_Old, Foo_New);
        ArrayCompare(Bar_Old, Bar_New);
    }
    w1.Stop();

    Stopwatch w2 = new Stopwatch();
    w2.Start();
    for (int lp = 0; lp < 10000; lp++)
    {
        ArrayCompare2(Foo_Old, Foo_New);
        ArrayCompare2(Bar_Old, Bar_New);
    }
    w2.Stop();

    Debug.WriteLine(w1.Elapsed.ToString());
    Debug.WriteLine(w2.Elapsed.ToString());
}

Since your arrays are sorted, you should be able to just go through the arrays simultaneously, and in one pass and determine if each element is in the other array. (Similar to the merge step in merge sort.) You can see a sample of that below:

string[] oldVersion = { "test1", "test2", "test3" };
string[] newVersion = { "test1", "test2", "test4", "test5" };

int oldIndex = 0, newIndex = 0;

while ((oldIndex < oldVersion.Length) && (newIndex < newVersion.Length)) {
   int comparison = oldVersion[oldIndex].CompareTo(newVersion[newIndex]);

   if (comparison < 0)
      Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);
   else if (comparison > 0)
      Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);
   else {
      Console.WriteLine("[Same]\t\t" + oldVersion[oldIndex++]);
      newIndex++;
   }
}

while (oldIndex < oldVersion.Length)
   Console.WriteLine("[Removed]\t" + oldVersion[oldIndex++]);

while (newIndex < newVersion.Length)
   Console.WriteLine("[Added]\t\t" + newVersion[newIndex++]);

Alternatively you'd need to go through one array, and for each element in this array, do a single pass of the other array looking for a match.

Edit: JP has a good suggestion on how to do this using the framework. Although, assuming the arrays are sorted, the benefit of my approach is that you only have to do one pass to find all the results. You would not have to do three passes.

I wrote this a while back:

Usage:

foreach (var diff in Foo_Old.Diff(Foo_New)){
   Console.WriteLine ("{0} action performed on {1}",diff.DiffAction,diff.Value);
}

Implementation:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace LinqExtensions {

    enum DiffAction {
       Added,
       Removed,
       Same
    }

    class DiffPair<T> {
        public T Value { get; set; }
        public DiffAction DiffAction { get; set; }
    }

    static class DiffExtension {
        public static IEnumerable<DiffPair<T>> Diff<T>
                 (
                     this IEnumerable<T> original,
                     IEnumerable<T> target 
                 ) {

            Dictionary<T, DiffAction> results = new Dictionary<T, DiffAction>();

            foreach (var item in original) {
                results[item] = DiffAction.Removed;
            }

            foreach (var item in target) {
                if (results.ContainsKey(item)) {
                    results[item] = DiffAction.Same;
                } else {
                    results[item] = DiffAction.Added;
                }
            }
            return results.Select(
                pair => new DiffPair<T> {
                    Value=pair.Key, 
                    DiffAction = pair.Value
                });
        }
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM