简体   繁体   中英

Out of memory exception while using threads

I have the following algorithm ,

private void writetodb()
{
    using(var reader = File.OpenRead("C:\Data.csv");
    using(var parser = new TextFieldParser(reader))
    { 
        //Do some opeartions
        while(!parser.EndOfData)
        {
            //Do operations
            //Take 500 rows of data and put it in dataset
            Thread thread = new thread(() => WriteTodb(tablename, set));
            thread.Start();
            Thread.Sleep(5000);
        }
    }
}

public void WriteTodb(string table, CellSet set)
{
    //WriteToDB
    //Edit: This statement will write to hbase db in hdinsight
    hbase.StoreCells(TableName, set);
}

This method works absolutely fine until 500 mb of data but after that it fails saying Out of memory exception .

I am pretty much sure that it is because of threads but using threads is mandatory and I cant change the architecture.
Can anybody tell me what modifications I have to make in thread programming in the above program to avoid memory exception.

First of all, I can't understand your words about threading:

I have to make in thread programming in the above program to avoid memory exception.

You will use the thread programming if you use the TPL , as it been already suggested. You really don't have to use the Thread class if you can't understand it. You say that your code is C# 4.0 so the TPL is an option for you. You can do you work something like this (very easy way):

List<Task> tasks  = new List<Task>();
while(!parser.EndOfData)
{
    tasks.Add(Task.Run(() => WriteTodb(tablename, set)));
}
Task.WaitAll(tasks.ToArray());

TPL engine will use the default TaskScheduler class, which uses internal ThreadPool and can level the resources you have on your server.

Also, I see that you're using the HBase client from Microsoft, and it has async method in it:

public async Task StoreCellsAsync(string table, CellSet cells)
{
}

So you can use the asynchronious approach in your code and TPL at the same time :

List<Task> tasks  = new List<Task>();
while(!parser.EndOfData)
{
    tasks.Add(WriteTodb(tablename, set)));
}
// asynchroniously await all the writes
await Task.WhenAll(tasks.ToArray());

public async Task WriteTodb(string table,CellSet set)
{
    //WriteToDB
    //Edit: This statement will write to hbase db in hdinsight asynchroniously!
    await hbase.StoreCellsAsync(TableName, set);
}

If, for some strange reasons, you can't use TPL , you have to refactor your code and write your own thread scheduler:

  1. You don't have to create the thread for your write each time, you can reuse them.
  2. Running second time inside the same thread is, in general, faster than create two different threads for each operation.
  3. Split file into some parts, create thread for the writing, and write the data in a loop.

Instead of creating new Thread everytime use ThreadPool.QueueUserWorkItem. For refrence see this: https://msdn.microsoft.com/en-us/library/kbf0f1ct(v=vs.110).aspx

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM