简体   繁体   中英

Sequential vs parallel solution memory usage

I have a slight issue with the following scenario: I'm given a list of ID values, I need to run a SELECT query (where the ID is a parameter), then combine all the result sets as one big one and return it to the caller.

Since the query might run for minutes per ID (that's another issue, but at the moment I consider it as a given fact), and there can be 1000s of IDs in the input) I tried to use tasks. With that approach I experience a slow, but solid increase in memory use.

As a test, I made a simple sequential solution too, this has normal memory usage graph, but as expected, very slow. There's an increase while it's running, but then everything drops back to the normal level when it's finished.

Here's the skeleton of code:

public class RowItem
{
    public int ID { get; set; }
    public string Name { get; set; }
    //the rest of the properties
}


public List<RowItem> GetRowItems(List<int> customerIDs)
{
    // this solution has the memory leak
    var tasks = new List<Task<List<RowItem>>>();
    foreach (var customerID in customerIDs)
    {
        var task = Task.Factory.StartNew(() => return ProcessCustomerID(customerID));
        tasks.Add(task);
    }

    while (tasks.Any())
    {
        var index = Task.WaitAny(tasks.ToArray());
        var task = tasks[index];
        rowItems.AddRange(task.Result);
        tasks.RemoveAt(index);
    }

    // this works fine, but slow
    foreach (var customerID in customerIDs)
    {
        rowItems.AddRange(ProcessCustomerID(customerID)));
    }

    return rowItems;
}

private List<RowItem> ProcessCustomerID(int customerID)
{
    var rowItems = new List<RowItem>();
    using (var conn = new OracleConnection("XXX"))
    {
        conn.Open();
        var sql = "SELECT * FROM ...";
        using (var command = new OracleCommand(sql, conn))
        {
            using (var dataReader = command.ExecuteReader())
            {
                using (var dataTable = new DataTable())
                {
                    dataTable.Load(dataReader);
                    rowItems = dataTable
                               .Rows
                               .OfType<DataRow>()
                               .Select(
                                   row => new RowItem
                                   {
                                       ID = Convert.ToInt32(row["ID"]),
                                       Name = row["Name"].ToString(),
                                       //the rest of the properties
                                   })
                               .ToList();
                }
            }
        }
        conn.Close();
    }
    return rowItems;
}

What am I doing wrong when using tasks? According to this MSDN article , I don't need to bother disposing them manually, but there's barely anything else. I guess ProcessCustomerID is OK, as it's called in both variations.

update To log the current memory usage I used Process.GetCurrentProcess().PrivateMemorySize64 , but I noticed the problem in Task Manager >> Processes

Using entity framework your ProcessCustomerID method could look like:

List<RowItem> rowItems;
using(var ctx = new OracleEntities()){
  rowItems = ctx.Customer
    .Where(o => o.id == customerID)
    .Select(
      new RowItem
      {
        ID = Convert.ToInt32(row["ID"]),
        Name = row["Name"].ToString(),
        //the rest of the properties
      }
    ).ToList();
}
return rowItems;

Unless you are transferring large amounts of data like images, video, data or blobs this should be near instantaneous with 1k data as result.

If it is unclear what is taking time, and you use pre 10g oracle, it would be really hard to monitor this. However if you use entity framework you can attach monitoring to it! http://www.hibernatingrhinos.com/products/efprof

At least a year ago Oracle supported entity framework 5.

In sequential they are executed one by one, in parallel they literally get started all at same time consuming your resources and creating deadlocks.

I don't think you have any evidences for a memory leak in the parallel execution.

May be Garbage Collection occurs at different times and that's why experienced two different readings. You cannot expect it release memory real time. .Net garbage collection occurs only when required. Have a look at “ Fundamentals of Garbage Collection

Task Manager or Process.GetCurrentProcess().PrivateMemorySize64 may not very accurate way to find memory leaks. If you do so, at least make sure you call full garbage collection and wait for pending finalizers prior reading memory counters.

GC.Collect();
GC.WaitForPendingFinalizers();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM