简体   繁体   English

顺序解决方案与并行解决方案的内存使用情况

[英]Sequential vs parallel solution memory usage

I have a slight issue with the following scenario: I'm given a list of ID values, I need to run a SELECT query (where the ID is a parameter), then combine all the result sets as one big one and return it to the caller. 我在以下情况下有一个小问题:我得到了ID值列表,我需要运行SELECT查询(其中ID是一个参数),然后将所有结果集组合为一个大值并将其返回给呼叫者,召集者。

Since the query might run for minutes per ID (that's another issue, but at the moment I consider it as a given fact), and there can be 1000s of IDs in the input) I tried to use tasks. 由于查询可能每个ID运行几分钟(这是另一个问题,但是目前我认为这是给定的事实),并且输入中可能有1000个ID),因此我尝试使用任务。 With that approach I experience a slow, but solid increase in memory use. 使用这种方法,我会发现内存使用缓慢但稳定地增加。

As a test, I made a simple sequential solution too, this has normal memory usage graph, but as expected, very slow. 作为测试,我也做了一个简单的顺序解决方案,该解决方案具有正常的内存使用情况图,但是正如预期的那样,非常慢。 There's an increase while it's running, but then everything drops back to the normal level when it's finished. 它在运行时会有所增加,但是在完成时一切都会恢复到正常水平。

Here's the skeleton of code: 这是代码的骨架:

public class RowItem
{
    public int ID { get; set; }
    public string Name { get; set; }
    //the rest of the properties
}


public List<RowItem> GetRowItems(List<int> customerIDs)
{
    // this solution has the memory leak
    var tasks = new List<Task<List<RowItem>>>();
    foreach (var customerID in customerIDs)
    {
        var task = Task.Factory.StartNew(() => return ProcessCustomerID(customerID));
        tasks.Add(task);
    }

    while (tasks.Any())
    {
        var index = Task.WaitAny(tasks.ToArray());
        var task = tasks[index];
        rowItems.AddRange(task.Result);
        tasks.RemoveAt(index);
    }

    // this works fine, but slow
    foreach (var customerID in customerIDs)
    {
        rowItems.AddRange(ProcessCustomerID(customerID)));
    }

    return rowItems;
}

private List<RowItem> ProcessCustomerID(int customerID)
{
    var rowItems = new List<RowItem>();
    using (var conn = new OracleConnection("XXX"))
    {
        conn.Open();
        var sql = "SELECT * FROM ...";
        using (var command = new OracleCommand(sql, conn))
        {
            using (var dataReader = command.ExecuteReader())
            {
                using (var dataTable = new DataTable())
                {
                    dataTable.Load(dataReader);
                    rowItems = dataTable
                               .Rows
                               .OfType<DataRow>()
                               .Select(
                                   row => new RowItem
                                   {
                                       ID = Convert.ToInt32(row["ID"]),
                                       Name = row["Name"].ToString(),
                                       //the rest of the properties
                                   })
                               .ToList();
                }
            }
        }
        conn.Close();
    }
    return rowItems;
}

What am I doing wrong when using tasks? 使用任务时我在做什么错? According to this MSDN article , I don't need to bother disposing them manually, but there's barely anything else. 根据这篇MSDN文章 ,我不需要手动处理它们,但是几乎没有其他东西。 I guess ProcessCustomerID is OK, as it's called in both variations. 我猜ProcessCustomerID可以,因为在两种版本中都被称为。

update To log the current memory usage I used Process.GetCurrentProcess().PrivateMemorySize64 , but I noticed the problem in Task Manager >> Processes 更新要记录当前的内存使用情况,我使用了Process.GetCurrentProcess().PrivateMemorySize64 ,但是我注意到任务管理器>>进程中的问题

Using entity framework your ProcessCustomerID method could look like: 使用实体框架,您的ProcessCustomerID方法可能类似于:

List<RowItem> rowItems;
using(var ctx = new OracleEntities()){
  rowItems = ctx.Customer
    .Where(o => o.id == customerID)
    .Select(
      new RowItem
      {
        ID = Convert.ToInt32(row["ID"]),
        Name = row["Name"].ToString(),
        //the rest of the properties
      }
    ).ToList();
}
return rowItems;

Unless you are transferring large amounts of data like images, video, data or blobs this should be near instantaneous with 1k data as result. 除非您要传输大量数据,例如图像,视频,数据或斑点,否则这应该几乎是瞬时的,结果是1k数据。

If it is unclear what is taking time, and you use pre 10g oracle, it would be really hard to monitor this. 如果不清楚需要什么时间,并且您使用了10g之前的oracle,则很难监视它。 However if you use entity framework you can attach monitoring to it! 但是,如果您使用实体框架,则可以对其进行监视! http://www.hibernatingrhinos.com/products/efprof http://www.hibernatingrhinos.com/products/efprof

At least a year ago Oracle supported entity framework 5. 至少一年前,Oracle支持实体框架5。

In sequential they are executed one by one, in parallel they literally get started all at same time consuming your resources and creating deadlocks. 按顺序依次执行它们,实际上并行执行它们,同时消耗您的资源并创建死锁。

I don't think you have any evidences for a memory leak in the parallel execution. 我认为您没有任何证据表明并行执行中发生内存泄漏。

May be Garbage Collection occurs at different times and that's why experienced two different readings. 可能垃圾收集发生在不同的时间,这就是为什么经历两次不同的阅读的原因。 You cannot expect it release memory real time. 您不能期望它会实时释放内存。 .Net garbage collection occurs only when required. .Net垃圾收集仅在需要时发生。 Have a look at “ Fundamentals of Garbage Collection 看看“ 垃圾收集的基础知识

Task Manager or Process.GetCurrentProcess().PrivateMemorySize64 may not very accurate way to find memory leaks. 任务管理器或Process.GetCurrentProcess().PrivateMemorySize64可能不是查找内存泄漏的非常准确的方法。 If you do so, at least make sure you call full garbage collection and wait for pending finalizers prior reading memory counters. 如果这样做,至少要确保调用完整的垃圾回收并在读取内存计数器之前等待挂起的终结器。

GC.Collect();
GC.WaitForPendingFinalizers();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM