简体   繁体   中英

C# comparing two large lists of items by a specific property

I have two large lists of items whos class look like this (both lists are of same type):

public class Items
{
 public string ItemID { get; set; }
 public int QuantitySold { get; set; }
}


var oldList = new List<Items>(); // oldList

var newList = new List<Items>(); // new list

The old list contains items from database and the new list represents items fetched from API;

Both lists can be very large with 10000+ items in each (20000 total)

I need to compare items from newList against the items from "oldList" and see which items that have same itemID value, are of different "QuantitySold" value, and those that are of different "QuantitySold" value should be stored in third list called "differentQuantityItems".

I could just simply do double foreach list and compare values but since both of the lists are large the performance with double foreach loop is terrible and I can't do it...

Can someone help me out with this?

@YamamotoTetsua I'm already using a IEqualityComparer to get the desired result, however it doesn't gives the results that I'm expecting. Here is why...I have a first IEqualityComparer which looks like this:

 public class MissingItemComparer : IEqualityComparer<SearchedUserItems>
    {
        public static readonly IEqualityComparer<SearchedUserItems> Instance = new MissingItemComparer();

        public bool Equals(SearchedUserItems x, SearchedUserItems y)
        {
            return x.ItemID == y.ItemID;
        }

        public int GetHashCode(SearchedUserItems x)
        {
            return x.ItemID.GetHashCode();
        }
    }

The usage of this IEqualityComparer basically gives me items from newList that are not present in my database like following:

var missingItems= newItems.Except(competitor.SearchedUserItems.ToList(), MissingItemComparer.Instance).ToList();

Now in this list I will have the list of items which are new from API and are not present in my DB...

Second IEqualityComparer is based on the different QuantitySold from old and new list:

public class ItemsComparer : IEqualityComparer<SearchedUserItems>
    {
        public static readonly IEqualityComparer<SearchedUserItems> Instance = new ItemsComparer();
        public bool Equals(SearchedUserItems x, SearchedUserItems y)
        {
            return (x.QuantitySold == y.QuantitySold);
        }
        public int GetHashCode(SearchedUserItems x)
        {
            return x.ItemID.GetHashCode();
        }
    }

Usage example:

var differentQuantityItems = newItems.Except(competitor.SearchedUserItems.ToList(), ItemsComparer.Instance).ToList();

The issue with these two equality comparers is that first one will for example return these itemID's that are missing:

123124124

123124421

512095902

And they indeed are missing from my oldList... However the second IEQualityComparer will also return these items as differentQuantity items, they indeed are, but the aren't present in the oldList.. So they shouldn't be included in the second list.

This is a perfect candidate for LINQ Join :

var differentQuantityItems =
    (from newItem in newList
     join oldItem in oldList on newItem.ItemID equals oldItem.ItemID
     where newItem.QuantitySold != oldItem.QuantitySold
     select newItem).ToList();

This will return all new items which have corresponding old item with different QuantitySold. If you want to also include the new items without corresponding old item, then use left outer join :

var differentQuantityItems =
    (from newItem in newList
     join oldItem in oldList on newItem.ItemID equals oldItem.ItemID into oldItems
     from oldItem in oldItems.DefaultIfEmpty()
     where oldItem == null || newItem.QuantitySold != oldItem.QuantitySold
     select newItem).ToList();

In both cases, join operator is used to quickly correlate the items with the same ItemID. Then you can compare QuantitySold or any other properties.

From a big-O complexity point of view, just comparing the lists in a nested for loop would be in the class of O(n*m) , being n the size of the list in the DB, and m the size of the list fetched from the API.

What you can do to improve your performance is to sort the two lists, that would cost O(n log (n) + m log (m)) , and then you could find the new items in O(n + m) . Therefore, the overall complexity of your algorithm would then be in the class of O(n log (n) + m log (m)) .

Here's an idea of the time it would take, comparing the quadratic solution to the superlinear one.

This code will run in less than a second, even if there are no matches at all (also less than a second if everything is a match).

It will return all items that exists in both lists (ie same ItemID ) but with a different QuantitySold .

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApp5
{
    class Program
    {
        public class Items
        {
            public string ItemID { get; set; }
            public int QuantitySold { get; set; }
        }

        static void Main(string[] args)
        {
            // Sample data
            var oldList = new List<Items>();
            oldList.AddRange(Enumerable.Range(0, 20000).Select(z => new Items() { ItemID = z.ToString(), QuantitySold = 4 }));

            var newList = new List<Items>();
            newList.AddRange(Enumerable.Range(0, 20000).Select(z => new Items() { ItemID = z.ToString(), QuantitySold = 5 }));

            var results = oldList.Join(newList,
                                            left => left.ItemID,
                                            right => right.ItemID,
                                            (left, right) => new { left, right })
                                .Where(z => z.left.QuantitySold != z.right.QuantitySold).Select(z => z.left);

            Console.WriteLine(results.Count());
            Console.ReadLine();
        }
    }
}

The use of z.left means only one of the items will be returned - if you want both the old and the new, instead use:

var results = oldList.Join(newList,
                                left => left.ItemID,
                                right => right.ItemID,
                                (left, right) => new { left, right })
                    .Where(z => z.left.QuantitySold != z.right.QuantitySold)
                    .Select(z => new[] { z.left, z.right })
                    .SelectMany(z => z);

You can think of using Except clause with custom written IEqualityComparer something like below

var oldList = new List<Item>(); // oldList
var newList = new List<Item>(); // new list
var distinctList = newList.Except(oldList,new ItemEqualityComparer()).ToList();



class ItemEqualityComparer : IEqualityComparer<Item>
        {
            public bool Equals(Item i1, Item i2)
            {
                if (i1.ItemID == i2.ItemID && i1.QuantitySold != i2.QuantitySold)
                    return false;
                return true;
            }

            public int GetHashCode(Item item)
            {
                return item.ItemID.GetHashCode();
            }
        }

public class Item
        {
            public string ItemID { get; set; }
            public int QuantitySold { get; set; }
        }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM