How can I convert below code to LINQ. Lists can sometimes contains 20k or 30k items. So I'm looking for something that improves performance and runs faster. Below is my code:
if(list1 != null)
{
foreach (var item in list1)
{
if(!list2.Any( x => x.Name == item.Name && x.Number == item.Number))
{
list2.Add(item)
}
}
}
I tried using Parallel.ForEach but it throws a "Collection was modified" error.
You can use the LINQ Distinct
method. It needs an IEqualityComparer set up, but luckily the MSDN example has just what you need already written:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Diagnostics;
namespace ConsoleApplication1
{
class Program
{
static Random rand = new Random();
// see https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.distinct for Distinct()
public class Product
{
public string Name { get; set; }
public int Number { get; set; }
}
// Custom comparer for the Product class
class ProductComparer : IEqualityComparer<Product>
{
// Products are equal if their names and product numbers are equal.
public bool Equals(Product x, Product y)
{
//Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
//Check whether the products' properties are equal.
return x.Number == y.Number && x.Name == y.Name;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Product product)
{
//Check whether the object is null
if (Object.ReferenceEquals(product, null)) return 0;
//Get hash code for the Name field if it is not null.
int hashProductName = product.Name == null ? 0 : product.Name.GetHashCode();
//Get hash code for the Code field.
int hashProductCode = product.Number.GetHashCode();
//Calculate the hash code for the product.
return hashProductName ^ hashProductCode;
}
}
static string RandomLetter()
{
return (rand.Next((int)'A', (int)'Z' + 1)).ToString();
}
static List<Product> CreateTestData()
{
int nItems = 20000;
List<Product> data = new List<Product>(nItems);
for (int i = 1; i <= nItems; i++)
{
data.Add(new Product { Name = RandomLetter() + RandomLetter(), Number = i % 10 });
}
return data;
}
static void Main(string[] args)
{
var list1 = CreateTestData();
Stopwatch sw = new Stopwatch();
sw.Start();
List<Product> noduplicates = list1.Distinct(new ProductComparer()).ToList();
sw.Stop();
Console.WriteLine($"x items: {list1.Count()} no duplicates: {noduplicates.Count()} Time: {sw.ElapsedMilliseconds} ms");
List<Product> list2 = new List<Product>();
if (list1 != null)
{
sw.Restart();
foreach (var item in list1)
{
if (!list2.Any(x => x.Name == item.Name && x.Number == item.Number))
{
list2.Add(item);
}
}
sw.Stop();
Console.WriteLine($"x items: {list1.Count()} list2: {noduplicates.Count()} Time: {sw.ElapsedMilliseconds} ms");
}
Console.ReadLine();
}
}
}
Sample output:
x items: 20000 no duplicates: 6393 Time: 12 ms
x items: 20000 list2: 6393 Time: 4225 ms
If you already had some data, you could use the Union
method instead, again using the comparer.
NB My RandomLetter()
function does not do what I intended. But it suffices.
20 - 30k items are not so much. All you need is to replace the potentially slow linear search
list2.Any(x => x.Name == item.Name && x.Number == item.Number)
with fast lookup data structure.
The easiest is to build a HashSet
with anonymous type containing the Name
and Number
properties. In order to do that, you can use the following handy custom extension method:
public static class Extensions
{
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> source, IEqualityComparer<T> comparer = null)
{
return new HashSet<T>(source, comparer);
}
}
and the code in question would be like this:
if (list1 != null)
{
var keys = list2.Select(item => new { item.Name, item.Number }).ToHashSet();
foreach (var item in list1)
{
var key = new { item.Name, item.Number };
if (!keys.Contains(key))
{
list2.Add(item);
keys.Add(key);
}
}
}
This is not LINQ, but it doesn't need to be, since LINQ is for querying, while your code is for modification.
You can make list2 a ConcurrentBag type and do like this. im not 100% sure that it will work as intended though.
public class Item
{
public string Name { get; set; }
public int Number { get; set; }
}
public void test()
{
var list1 = new List<Item>(); // add items to list1 or maybe load from a database?
var list2 = new ConcurrentBag<Item>();
Parallel.ForEach(list1.ToList(), (item, state, arg3) =>
{
if (!list2.Any(x => x.Name == item.Name && x.Number == item.Number))
{
list2.Add(item);
}
});
}
Group by your distiction and select the first record from the group and create your list.
Fastest if you don't need to provide existing values
var list2 = list1.GroupBy(i => new { i.Name, i.Number })
.Select(g=>g.First())
.ToList();
Simple if you have existing values (4 times slower than the next version)
if you list2
has preexisting values you can do something like this...
var keys = list2.ToList();
var toadd = list1.GroupBy(i => new { i.Name, i.Number })
.Where(g => !keys.Any(i => i.Name == g.Key.Name
&& i.Number == g.Key.Number))
.Select(g=>g.First());
list2.AddRange(toadd);
Fastest if you need to update a set with existing values
public static HashSet<T> ToHashSet<T>(this IEnumerable<T> items)
{
return new HashSet<T>(items);
}
var keys = list2.Select(i => new { i.Name, i.Number }).ToHashSet();
var toadd = list1.GroupBy(i => new { i.Name, i.Number })
.Where(g => !keys.Contains(g.Key))
.Select(g => g.First());
list2.AddRange(toadd);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.