简体   繁体   中英

C# Generic List<T> update items

I'm using a List<T> and I need to update the objects properties that the list has.

What would be the most efficient/faster way to do this? I know that scanning through the index of a List<T> would be slower as this list grows and that the List<T> is not the most efficient collection to do updates.

That sad, would be better to:

  • Remove the match object then add a new one?
  • Scan through the list indexes until you find the matching object and then update the object's properties?
  • If I have a collection, let's IEnumerable and I want to update that IEnumerable into the List, what would be best approach.

Stub code sample:

public class Product
{
    public int ProductId { get; set; }
    public string ProductName { get; set; }
    public string Category { get; set; }
}

public class ProductRepository
{
    List<Product> product = Product.GetProduct();
    public void UpdateProducts(IEnumerable<Product> updatedProduct)
    {
    }
    public void UpdateProduct(Product updatedProduct)
    {
    }
}

You could consider using Dictionary instead of List if you want fast lookups. In your case it would be the product Id (which I am assuming is unique). Dictionary MSDN

For example:

public class ProductRepository
    {
        Dictionary<int, Product> products = Product.GetProduct();
        public void UpdateProducts(IEnumerable<Product> updatedProducts)
        {
            foreach(var productToUpdate in updatedProducts)
            {
                UpdateProduct(productToUpdate);
            }

            ///update code here...
        }
        public void UpdateProduct(Product productToUpdate)
        {
            // get the product with ID 1234 
            if(products.ContainsKey(productToUpdate.ProductId))
            {
                var product = products[productToUpdate.ProductId];
                ///update code here...
                product.ProductName = productToUpdate.ProductName;
            }
            else
            {
                //add code or throw exception if you want here.
                products.Add(productToUpdate.ProductId, productToUpdate);
            }
        }
    }

What exactly is efficiency?

Unless there are literally thousands of items doing a foreach, or for or any other type of looping operation will most likely only show differences in the milleseconds. Really? Hence you have wasted more time (in costs of a programmer at $XX per hour than an end user costs) trying to find that best .

So if you have literally thousands of records I would recommend that efficiency be found by parallel processing the list with the Parallel.Foreach method which can process more records to save time with the overhead of threading.


IMHO if the record count is greater than 100 it implies that there is a database being used. If a database is involved, write an update sproc and call it a day; I would be hard pressed to write a one-off program to do a specific update which could be done in an easier fashion in said database.

Your use case is updating a List<T> , which can contains millions of records, and updated records can be a sub-list or just a single record

Following is the Schema:

public class Product
{
    public int ProductId { get; set; }
    public string ProductName { get; set; }
    public string Category { get; set; }
}

Does Product contains a primary key, which means every Product object can be uniquely identified and there are no duplicates and every update target a single unique record?

If Yes , then it is best to arrange the List<T> in the form of Dictionary<int,T> , which would mean for an IEnumerable<T> every update would be an O(1) time complexity and that would mean all the updates could be done depending on the size of the IEnumerable<T> , which i don't expect to be very big and though there would be extra memory allocation of different data structure required, but would be a very fast solution.@JamieLupton has already provided a solution on similar lines

In case Product is repeated, there's no primary key, then above solution is not valid, then ideal way to scan through the List<T> is Binary Search, whose time complexity is O(logN)

Now since size of IEnumerable<T> is comparatively small say M, so the overall time complexity would be O(M*logN) , where M is much smaller than N and can be neglected.

List<T> support Binary Search API, which provides the element index, which can then be used to update the object at relevant index, check example here

Best Option as per me for such a high number of records would be parallel processing along with binary search

Now since, thread safety is an issue, what I normally do is divide a List<T> into List<T>[] , since then each unit can be assigned to a separate thread, a simple way is use MoreLinq batch Api, where you can fetch the number of system processors as using Environment.ProcessorCount and then create IEnumerable<IEnumerable<T>> as follows:

var enumerableList = List<T>.Batch(Environment.ProcessorCount).ToList();

Another way is following custom code:

public static class MyExtensions
{
    // data - List<T>
    // dataCount - Calculate once and pass to avoid accessing the property everytime
    // Size of Partition, which can be function of number of processors
    public static List<T>[] SplitList<T>(this List<T> data, int dataCount, int partitionSize)
    {
        int remainderData;    
        var fullPartition = Math.DivRem(dataCount, partitionSize, out remainderData);    
        var listArray = new List<T>[fullPartition];    
        var beginIndex = 0;

        for (var partitionCounter = 0; partitionCounter < fullPartition; partitionCounter++)
        {
            if (partitionCounter == fullPartition - 1)
                listArray[partitionCounter] = data.GetRange(beginIndex, partitionSize + remainderData);
            else
                listArray[partitionCounter] = data.GetRange(beginIndex, partitionSize);    
            beginIndex += partitionSize;
        }    
        return listArray;
    }
}

Now you can create Task[] , where each Task is assigned for every element List<T> , on the List<T>[] generated above, then Binary search for each sub partition. Though its repetitive but would be using the power of Parallel processing and Binary search. Each Task can be started and then we can wait using Task.WaitAll(taskArray) to wait for Task processing to finish

Over and above that, if you want to create a Dictionary<int,T>[] and thus use parallel processing then this would be fastest.

Final integration of List<T>[] to List<T> can be done using Linq Aggregation or SelectMany as follows:

List<T>[] splitListArray = Fetch splitListArray;

// Process  splitListArray

var finalList = splitListArray.SelectMany(obj => obj).ToList()

Another option would be to use Parallel.ForEach along with a thread safe data structure like ConcurrentBag<T> or may be ConcurrentDictionary<int,T> in case you are replacing complete object, but if its property update then a simple List<T> would work. Parallel.ForEach internally use range partitioner similar to what I have suggested above

Solutions mentioned above ideally depends on your use case, you shall be able to use combination to achieve the best possible result. Let me know, in case you need specific example

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM