简体   繁体   English

将线程安全集合转换为DataTable的最佳方法?

[英]Best way to convert thread safe collection to DataTable?

So here is the scenario: 所以这是场景:

I have to take a group of data, process it, build an object and then insert those objects into a database. 我必须获取一组数据,对其进行处理,构建一个对象,然后将这些对象插入数据库中。

In order to increase performance, I am multi-threading the processing of the data using a parallel loop and storing the objects in a CollectionBag list. 为了提高性能,我使用并行循环对数据处理进行多线程处理,并将对象存储在CollectionBag列表中。

That part works fine. 那部分工作正常。 However, the issue here is I now need to take that list, convert it into a DataTable object and insert the data into the database. 但是,这里的问题是我现在需要获取该列表,将其转换为DataTable对象,然后将数据插入数据库中。 It's very ugly and I feel like I'm not doing this in the best way possible (pseudo below): 这非常丑陋,我觉得我没有以最好的方式做到这一点(下面的伪指令):

ConcurrentBag<FinalObject> bag = new ConcurrentBag<FinalObject>();

ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = Environment.ProcessorCount;

Parallel.ForEach(allData, parallelOptions, dataObj =>
{   
    .... Process data ....

    bag.Add(theData);

    Thread.Sleep(100);
});

DataTable table = createTable();
foreach(FinalObject moveObj in bag) {
    table.Rows.Add(moveObj.x);
}

This is a good candidate for PLINQ (or Rx - I'll focus on PLINQ since it's part of the Base Class Library). 这是PLINQ(或Rx-因为它是基类库的一部分,我将重点介绍PLINQ)的一个很好的候选人。

IEnumerable<FinalObject> bag = allData
    .AsParallel()
    .WithDegreeOfParallelism(Environment.ProcessorCount)
    .Select(dataObj =>
    {
        FinalObject theData = Process(dataObj);

        Thread.Sleep(100);

        return theData;
    });

DataTable table = createTable();

foreach (FinalObject moveObj in bag)
{
    table.Rows.Add(moveObj.x);
}

Realistically, instead of throttling the loop via Thread.Sleep , you should be limiting the maximum degree of parallelism further until you get the CPU usage down to the desired level. 实际上,而不是通过Thread.Sleep限制循环,您应该进一步限制最大并行度,直到将CPU使用率降低到所需水平为止。

Disclaimer: all of the below is meant for entertainment only, although it does actually work. 免责声明:以下所有内容仅供娱乐之用,尽管它确实可以工作。

Of course you can always kick it up a notch and produce a full-on async Parallel.ForEach implementation that allows you to process input in parallel and do your throttling asynchronously, without blocking any thread pool threads. 当然,您始终可以将其提升一个档次,并生成一个完整的异步Parallel.ForEach实现,该实现允许您并行处理输入并异步进行调节,而不会阻塞任何线程池线程。

async Task ParallelForEachAsync<TInput, TResult>(IEnumerable<TInput> input,
                                                 int maxDegreeOfParallelism,
                                                 Func<TInput, Task<TResult>> body,
                                                 Action<TResult> onCompleted)
{
    Queue<TInput> queue = new Queue<TInput>(input);

    if (queue.Count == 0) {
        return;
    }

    List<Task<TResult>> tasksInFlight = new List<Task<TResult>>(maxDegreeOfParallelism);

    do
    {
        while (tasksInFlight.Count < maxDegreeOfParallelism && queue.Count != 0)
        {
            TInput item = queue.Dequeue();
            Task<TResult> task = body(item);

            tasksInFlight.Add(task);
        }

        Task<TResult> completedTask = await Task.WhenAny(tasksInFlight).ConfigureAwait(false);

        tasksInFlight.Remove(completedTask);

        TResult result = completedTask.GetAwaiter().GetResult(); // We know the task has completed. No need for await.

        onCompleted(result);
    }
    while (queue.Count != 0 || tasksInFlight.Count != 0);
}

Usage ( full Fiddle here ): 用法( 此处为完整小提琴 ):

async Task<DataTable> ProcessAllAsync(IEnumerable<InputObject> allData)
{
    DataTable table = CreateTable();
    int maxDegreeOfParallelism = Environment.ProcessorCount;

    await ParallelForEachAsync(
        allData,
        maxDegreeOfParallelism,
        // Loop body: these Tasks will run in parallel, up to {maxDegreeOfParallelism} at any given time.
        async dataObj =>
        {
            FinalObject o = await Task.Run(() => Process(dataObj)).ConfigureAwait(false); // Thread pool processing.

            await Task.Delay(100).ConfigureAwait(false); // Artificial throttling.

            return o;
        },
        // Completion handler: these will be executed one at a time, and can safely mutate shared state.
        moveObj => table.Rows.Add(moveObj.x)
    );

    return table;
}

struct InputObject
{
    public int x;
}

struct FinalObject
{
    public int x;
}

FinalObject Process(InputObject o)
{
    // Simulate synchronous work.
    Thread.Sleep(100);

    return new FinalObject { x = o.x };
}

Same behaviour, but without Thread.Sleep and ConcurrentBag<T> . 行为相同,但没有Thread.SleepConcurrentBag<T>

I think something like this should give better performance, looks like object[] is a better option than DataRow as you need DataTable to get a DataRow object. 我认为类似这样的东西应该可以提供更好的性能,看起来像object []比DataRow更好,因为您需要DataTable来获取DataRow对象。

ConcurrentBag<object[]> bag = new ConcurrentBag<object[]>();

Parallel.ForEach(allData, 
    new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, 
    dataObj =>
{
    object[] row = new object[colCount];

    //do processing

    bag.Add(row);

    Thread.Sleep(100);
});

DataTable table = createTable();
foreach (object[] row in bag)
{
    table.Rows.Add(row);
}

Sounds like you've complicated things quite a bit by tring to make everything run in parallel, but if you store DataRow obejcts in your bag instead of plain objects, at the end you can use DataTableExtensions to create a DataTable from a generic collection quite easily: 听起来,通过使所有内容并行运行,您已经使事情变得相当复杂,但是如果您将DataRow对象存储在包中而不是普通对象中,那么最后您可以使用DataTableExtensions轻松地从通用集合创建DataTable

var dataTable = bag.CopyToDataTable();

Just add a reference to System.Data.DataSetExtensions in your project. 只需在项目中添加对System.Data.DataSetExtensions的引用即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM