简体   繁体   中英

What is the best way to iterate large volume of data in loop and prepare datatable

I have list of object that needs to convert into datatable.

The collection length is existed for 20K or more than that.

When i try to iterate the collection using Parallel.for then, it just hang up and took too longer time.

Can any one suggest the best way to convert List of object to datatable optimally ?

If you already have the object in memory and must to convert them to a DataTable you are pretty screwed. DataTable isn't thread safe

https://social.msdn.microsoft.com/Forums/en-US/ddcdac9d-35e7-4b9f-a367-242bf60c42f2/faq-item-is-datatable-thread-safe

And you are doubling up your memory usage.

My only suggestion would be that perhaps you can wrap your existing collection in an object inheriting from DataTable and override or hide the Methods so that they reference your underlying list.

However I think this is unlikely to be a 'good' or easy solution to your problem. The best approach would be remove the need for the DataTable

While DataTable operations (including .NewRow() ) are not thread safe , your work still can be parallelized using thread-local variables in the parallel loop :

List<string> source = Enumerable.Range(0, 20000).Select(i => i.ToString()).ToList();
DataTable endResult = CreateEmptyTable();
object lck = new object();

Parallel.For(
    0, source.Count,
    () => CreateEmptyTable(), // method to initialize the thread-local table
    (i, state, threadLocalTable) => // method invoked by the loop on each iteration
    {
        DataRow dr = threadLocalTable.NewRow();

        // running in parallel can only be beneficial 
        // if you do some CPU-heavy conversion in here
        // rather than simple assignment as below
        dr[0] = source[i];

        threadLocalTable.Rows.Add(dr);
        return threadLocalTable;
    },

    // Method to be executed when each partition has completed. 
    localTable =>
    {
        // lock to ensure that the result table 
        // is not screwed by merging from multiple threads simultaneously
        lock (lck)
        {
            endResult.Merge(localTable);
        }
    }
);

where

    private static DataTable CreateEmptyTable()
    {
        DataTable dt = new DataTable();
        dt.Columns.Add("MyString");
        return dt;
    }

However the parallel execution will only be beneficial if the time saved on the conversion 'your object instance' -> DataRow is greater than the time lost on joining the result in the end of the execution (locks + DataTable merges). Which is only possible if your conversion is somewhat CPU-heavy. In my example the conversion (dr[0] = source[i]) is not CPU heavy at all, and hence sequential execution is preferrable.

PS. the above example modified to run sequentially completes under 20ms on my IntelCore-i7-3537U. If your sequential executional times are low, you may not want to bother with parallel execution at all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM