简体   繁体   中英

Multi Threading with LINQ to SQL

I am writing a WinForms application. I am pulling data from my database, performing some actions on that data set and then plan to save it back to the database. I am using LINQ to SQL to perform the query to the database because I am only concerned with 1 table in our database so I didn't want to implement an entire ORM for this.

I have it pulling the dataset from the DB. However, the dataset is rather large. So currently what I am trying to do is separate the dataset into 4 relatively equal sized lists ( List<object> ).

Then I have a separate background worker to run through each of those lists, perform the action and report its progress while doing so. I have it planned to consolidate those sections into one big list once all 4 background workers have finished processing their section.

But I keep getting an error while the background workers are processing their unique list. Do the objects maintain their tie to the DataContext for the LINQ to SQL even though they have been converted to List objects? Any ideas how to fix this? I have minimal experience with multi-threading so if I am going at this completely wrong, please tell me.

Thanks guys. If you need any code snippets or any other information just ask.

Edit: Oops. I completely forgot to give the error message. In the DataContext designer.cs it gives the error An item with the same key has already been added. on the SendPropertyChanging function.

private void Setup(){
    List<MyObject> quarter1 = _listFromDB.Take(5000).ToList();
    bgw1.RunWorkerAsync();
}

private void bgw1_DoWork(object sender, DoWorkEventArgs e){
    e.Result = functionToExecute(bgw1, quarter1);
} 

private List<MyObject> functionToExecute(BackgroundWorker caller, List<MyObject> myList)
    {
        int progress = 0;
        foreach (MyObject obj in myList)
        {
            string newString1 = createString();
            obj.strText = newString;
            //report progress here
            caller.ReportProgress(progress++);
        }
        return myList;
    }

This same function is called by all four workers and is given a different list for myList based on which worker is called the function.

Because a real answer has yet to be posted, I'll give it a shot. Given that you haven't shown any LINQ-to-SQL code (no usage of DataContext) - I'll take an educated guess that the DataContext is shared between the threads, for example:

using (MyDataContext context = new MyDataContext())
{
    // this is just some random query, that has not been listed - ToList()
    // thus query execution is defered. listFromDB = IQueryable<>
    var listFromDB = context.SomeTable.Where(st => st.Something == true);

    System.Threading.Tasks.Task.Factory.StartNew(() => 
    {
        var list1 = listFromDB.Take(5000).ToList(); // runs the SQL query
        // call some function on list1
    });

    System.Threading.Tasks.Task.Factory.StartNew(() => 
    {
        var list2 = listFromDB.Take(5000).ToList(); // runs the SQL query
        // call some function on list2
    });
}

Now the error you got - An item with the same key has already been added. - was because the DataContext object is not thread safe ! A lot of stuff happens in the background - DataContext has to load objects from SQL, track their states, etc. This background work is what throws the error (because each thread is running the query, the DataContext gets accessed).

At least this is my own personal experience. Having come across the same error while sharing the DataContext between multiple threads. You only have two options in this scenario:

1) Before starting the threads, call .ToList() on the query, making listFromDB not an IQueryable<> , but an actual List<> . This means that the query has already ran and the threads operate on an actual List, not on the DataContext.

2) Move the DataContext definition into each thread. Because the DataContext is no longer shared, no more errors.

The third option would be to re-write the scenario into something else, like you did (for example, make everything sequential on a single background thread)...

First of all, I don't really see why you'd need multiple worker threads at all. (are theses lists in seperate databases / tables / servers? Do you really want to show 4 progress bars if you have 4 lists or are you somehow merging these progress reportings into one weird progress bar:D

Also, you're trying to speed up processing updates to your databases, but you don't send linq to sql any SAVES, so you're not really batching transactions, you'll just save everything at the end in one big transaction, is that really what you're aiming for? the progress bar will just stop at 100% and then spend a lot of time on the SQL side.

Just create one background thread and process everything synchronously, but batch a save transaction every couple of rows (i'd suggest something like every 1000 rows, but you should experiment with this) , it'll be fast, even with millions of rows,

If you really need this multithreaded solution: The "another blabla with the same key has been added" error suggests that you are adding the same item to multiple "mylists", or adding the same item to the same list twice, otherwise how would there be any errors at all?

Using Parallel LINQ (PLINQ) , you can take benefit of multiple CPU cores for processing your data. But if your application is going to run on single-core CPU, then splitting data into peaces wouldn't give you performance benefits instead it will incur some context-change overhead.

Hope it Helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM