简体   繁体   English

在循环中迭代大量数据并准备数据表的最佳方法是什么

[英]What is the best way to iterate large volume of data in loop and prepare datatable

I have list of object that needs to convert into datatable. 我有需要转换为数据表的对象列表。

The collection length is existed for 20K or more than that. 收集长度存在20K或更多。

When i try to iterate the collection using Parallel.for then, it just hang up and took too longer time. 当我尝试使用Parallel.for迭代集合时,它只是挂断了并且花费了太多时间。

Can any one suggest the best way to convert List of object to datatable optimally ? 有人可以建议将对象列表最佳转换为数据表的最佳方法吗?

If you already have the object in memory and must to convert them to a DataTable you are pretty screwed. 如果对象已经在内存中,并且必须将其转换为数据表,那么您会很费劲。 DataTable isn't thread safe DataTable不是线程安全的

https://social.msdn.microsoft.com/Forums/en-US/ddcdac9d-35e7-4b9f-a367-242bf60c42f2/faq-item-is-datatable-thread-safe https://social.msdn.microsoft.com/Forums/zh-CN/ddcdac9d-35e7-4b9f-a367-242bf60c42f2/faq-item-is-datatable-thread-safe

And you are doubling up your memory usage. 而且您的内存使用量增加了一倍。

My only suggestion would be that perhaps you can wrap your existing collection in an object inheriting from DataTable and override or hide the Methods so that they reference your underlying list. 我唯一的建议是,也许您可​​以将现有集合包装在从DataTable继承的对象中,并覆盖或隐藏Methods,以便它们引用您的基础列表。

However I think this is unlikely to be a 'good' or easy solution to your problem. 但是,我认为这不太可能是解决您的问题的“好”或简单的解决方案。 The best approach would be remove the need for the DataTable 最好的方法是消除对DataTable的需求

While DataTable operations (including .NewRow() ) are not thread safe , your work still can be parallelized using thread-local variables in the parallel loop : 尽管DataTable操作(包括.NewRow())不是线程安全的 ,但仍可以 在并行循环中使用线程局部变量对您的工作进行并行

List<string> source = Enumerable.Range(0, 20000).Select(i => i.ToString()).ToList();
DataTable endResult = CreateEmptyTable();
object lck = new object();

Parallel.For(
    0, source.Count,
    () => CreateEmptyTable(), // method to initialize the thread-local table
    (i, state, threadLocalTable) => // method invoked by the loop on each iteration
    {
        DataRow dr = threadLocalTable.NewRow();

        // running in parallel can only be beneficial 
        // if you do some CPU-heavy conversion in here
        // rather than simple assignment as below
        dr[0] = source[i];

        threadLocalTable.Rows.Add(dr);
        return threadLocalTable;
    },

    // Method to be executed when each partition has completed. 
    localTable =>
    {
        // lock to ensure that the result table 
        // is not screwed by merging from multiple threads simultaneously
        lock (lck)
        {
            endResult.Merge(localTable);
        }
    }
);

where 哪里

    private static DataTable CreateEmptyTable()
    {
        DataTable dt = new DataTable();
        dt.Columns.Add("MyString");
        return dt;
    }

However the parallel execution will only be beneficial if the time saved on the conversion 'your object instance' -> DataRow is greater than the time lost on joining the result in the end of the execution (locks + DataTable merges). 但是 ,只有在转换 “您的对象实例”-> DataRow 上节省的时间 大于在执行结束时加入结果所浪费的时间 (锁+ DataTable合并)时,并行执行才是有益的 Which is only possible if your conversion is somewhat CPU-heavy. 仅当您的转换占用大量CPU时才有可能。 In my example the conversion (dr[0] = source[i]) is not CPU heavy at all, and hence sequential execution is preferrable. 在我的示例中,转换(dr [0] = source [i])根本不占用CPU,因此优先执行顺序执行。

PS. PS。 the above example modified to run sequentially completes under 20ms on my IntelCore-i7-3537U. 在我的IntelCore-i7-3537U上,经过修改以按顺序运行的上述示例在20ms内完成。 If your sequential executional times are low, you may not want to bother with parallel execution at all. 如果您的顺序执行时间很短,那么您可能根本不想打扰并行执行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM