简体   繁体   中英

Synchronizing two enumerables

I have two collections.

  var a = new List<string>() { "a", "b", "c", "d", "e", "f", "j" };
  var b = new List<string>() { "a", "c", "d", "h", "i" };

And I would like to do some action on the item in case it's missing in one collection or another.

public static Synchronize<T>(IEnumerable<T> first, IEnumerable<T> second, Action<T> firstSynchronizer, Action<T> secondSynchronizer)
{
  var firstUnique = first.Distinct();
  var secondUnique = second.Distinct();
  foreach (var item in firstUnique)
  {
    if (!secondUnique.Contains(item)) firstSynchronizer(item);
  }
  foreach (var item in second.Distinct())
  {
    if (!firstUnique.Contains(item)) secondSynchronizer(item);
  }
}

This is what I got but I am not happy with it. I can't help but wonder if there's a better way to implement this, because I think Distinct() is pretty big performance hit and also I am not sure if it's better to iterate whole second Enumerable and check if item is not present in first Enumerable already (like above) or if it would be better to iterate second.Except(first) ? What do you guys think?

I call it like this:

  var a = new List<string>() { "a", "b", "c", "d", "e", "f", "j" };
  var b = new List<string>() { "a", "c", "d", "h", "i" };
  Synchronize(a.ToArray(), b.ToArray(), t => b.Add(t), t => a.Add(t));

I call ToArray() so collections don't get changed while being iterated over and lambdas just add missing elements to respective lists.

Also, this is just a test implementation. In production environment, Enumerables won't be of same type. This is intended to be used to sync remote and local storage. In future, Enumerable first will be for example ICollection<DummyRemoteItem> and Enumerable second will be List<IO.FileSystemInfo> . But I want it to be more generic. To make it possible to work with different collections, I think I would propose another type parameter and a Func<T1, T2, bool> for comparing items. That would be a best approach, right?

Generally, what's the best way to implement insides of

Synchronize<T>(IEnumerable<T> first,IEnumerable<T> second,Action<T> firstSynchronizer,Action<T> secondSynchronizer) 

and

Synchronize<TFirst, TSecond>(IEnumerable<TFirst> first,IEnumerable<TSecond> second,Action<TFirst> firstSynchronizer,Action<TSecond> secondSynchronizer, Func<TFirst, TSecond, bool> predicate)

You can use the Except and Intersect methods to find the differences or identical items between two enumerable sources. MSDN has a lot of resources on both Except and Intersect .

As for the comment about Distinct() having an expensive performance hit I would suspect that the hit is minor and trying to optimize would be to do so prematurely.

I guess the most elegant way would be to use sets. In .NET HashSet might be what you are looking for,

http://msdn.microsoft.com/en-us/library/bb495294.aspx

Linq full outer join is your friend here.

Here's an implementation (from here )

public static IEnumerable<Tuple<T1, T2>> FullOuterJoin<T1, T2>
   (this IEnumerable<T1> one, IEnumerable<T2> two, Func<T1,T2,bool> match)
{
 var left = from a in one
   from b in two.Where((b) => match(a, b)).DefaultIfEmpty()
   select new Tuple<T1, T2>(a, b);

 var right = from b in two
   from a in one.Where((a) => match(a, b)).DefaultIfEmpty()
   select new Tuple<T1, T2>(a, b);

 return left.Concat(right).Distinct();
}

so:

a.FullOuterJoin(b,a=>a,b=>b,(a,b)=>new {a,b})

and look for nulls in the resulting enumerable.

The following could be used if the items in collections are of two different types:

 class CollectionSynchronizer<TSource, TDestination>
    {
        public Func<TSource, TDestination, bool> CompareFunc { get; set; }
        public Action<TDestination> RemoveAction { get; set; }
        public Action<TSource> AddAction { get; set; }
        public Action<TSource, TDestination> UpdateAction { get; set; }

        public void Synchronizer(ICollection<TSource> sourceItems, ICollection<TDestination> destinationItems)
        {
            // Remove items not in source from destination
            RemoveItems(sourceItems, destinationItems);

            // Add items in source to destination 
            AddOrUpdateItems(sourceItems, destinationItems);
        }

        private void RemoveItems(ICollection<TSource> sourceCollection, ICollection<TDestination> destinationCollection)
        {
            foreach (var destinationItem in destinationCollection.ToArray())
            {
                var sourceItem = sourceCollection.FirstOrDefault(item => CompareFunc(item, destinationItem));

                if (sourceItem == null)
                {
                    RemoveAction(destinationItem);
                }
            }
        }

        private void AddOrUpdateItems(ICollection<TSource> sourceCollection, ICollection<TDestination> destinationCollection)
        {
            var destinationList = destinationCollection.ToList();
            foreach (var sourceItem in sourceCollection)
            {
                var destinationItem = destinationList.FirstOrDefault(item => CompareFunc(sourceItem, item));

                if (destinationItem == null)
                {
                    AddAction(sourceItem);
                }
                else
                {
                    UpdateAction(sourceItem, destinationItem);
                }
            }
        }
    }

And the usage would be like this:

var collectionSynchronizer = new CollectionSynchronizer<string, ContentImageEntity>
            {
                CompareFunc = (communityImage, contentImage) => communityImage == contentImage.Name,
                AddAction = sourceItem =>
                {
                    var contentEntityImage = _contentImageProvider.Create(sourceItem);
                    contentEntityImages.Add(contentEntityImage);
                },
                UpdateAction = (communityImage, contentImage) =>
                {
                    _contentImageProvider.Update(contentImage);
                },
                RemoveAction = contentImage =>
                {
                    contentEntityImages.Remove(contentImage);
                }
            };

            collectionSynchronizer.Synchronizer(externalContentImages, contentEntityImages);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM