简体   繁体   English

从对象集合创建批次

[英]Create Batches from Object collection

I have a collection of objects (more than 500 in count) I would like to batch it with some logic, like a batch of 50's and I should get 10 sets.(10*50=500) 我有一个对象集合(计数超过500个),我想用某种逻辑对其进行批处理,例如一批50个,我应该得到10套。(10 * 50 = 500)

I am using the logic below: 我正在使用以下逻辑:

 public class CustomEngineReader :IEnumerable<List<EngineToken>>
{
    StreamReader sr;
    int _batchSize = 1;

    public CustomFileReader(List<EngineToken> tokens, int batchSize)
    {
        if (batchSize > 0)
        {
            _batchSize = batchSize;
        }



    }


    public IEnumerator<List<string>> GetEnumerator()
    {

        string input = string.Empty;

        foreach(var item in EngineTokens)
        {
            int i = 0;
            List<string> batch = new List<string>();

            while (i < _batchSize && item !=null)
            {
                batch.Add(item );
                i++;
            }

            if (batch.Count != 0)
            {
                yield return batch;
            }
        }

    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

I am using the above code as below 我正在使用上面的代码如下

CustomEngineReader reader = new CustomEngineReader (this.tokencollection, 50);
foreach(List<EngineToken> items in reader)
{
       //in each iteration we get batch specified objects
       foreach(EngineToken item in items)
       {
          //Process
       }    
}

It's not working. 没用

Having a while loop inside of your foreach means that you're adding the same item over and over again until you hit your batch size. foreach内部有一个while循环意味着您要一遍又一遍地添加相同的项目,直到达到批量大小为止。

You also need to create your batch object outside of the foreach , so that you can add multiple items from the loop into it. 您还需要在foreach之外创建batch对象,以便可以将循环中的多个项目添加到其中。

Personally I prefer to write this as an extension method, rather than a separate class, to follow in line with the LINQ style of programming. 就个人而言,我更喜欢将其编写为扩展方法,而不是单独的类,以遵循LINQ编程风格。 It also can be trivially made generic, greatly improving its usefulness. 也可以将其微不足道地泛化,从而大大提高其实用性。 My implementation of Batch is: 我对Batch实现是:

public static IEnumerable<IEnumerable<T>> Batch<T>(
    this IEnumerable<T> source, int batchSize)
{
    List<T> buffer = new List<T>(batchSize);

    foreach (T item in source)
    {
        buffer.Add(item);

        if (buffer.Count >= batchSize)
        {
            yield return buffer;
            buffer = new List<T>(batchSize);
        }
    }
    if (buffer.Count > 0)
    {
        yield return buffer;
    }
}

You are adding same item multiple times in a loop. 您正在循环中多次添加同一项目。 Try: 尝试:

public IEnumerator<List<EngineToken>> GetEnumerator()
{

    string input = string.Empty;

    int i=0;
    List<string> batch = new List<string>();
    foreach(var item in EngineTokens)
    {
        batch.Add(item);
        i++;
        if(i==_batchSize)
        {
             yield return batch;
             batch = new List<string>();
             i = 0;
        }
    }
    if (batch.Count != 0)
    {
        yield return batch;
    }

}

Here is a simpler implementation, non-thread safe. 这是一个更简单的实现,非线程安全。 For thread safety you will need to add locks as necessary. 为了线程安全,您需要根据需要添加锁。

public IEnumerator<List<EngineToken> GetEnumerator()
{    
        var currentBatch = EngineTokens.Take(_batchSize);
        EngineTokens = EngineTokens.Skip(_batchSize).ToList();
        return currentBatch;

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM