简体   繁体   English

初始化对象时的内存消耗

[英]Memory consumption when initializing object

I am trying to build some objects and insert them into a database. 我正在尝试构建一些对象并将其插入数据库。 The number of records that have to be inserted is big ~ millions. 必须插入的记录数量很大〜数百万。 The insert is done in batches. 插入是分批完成的。 The problem I am having is that i need to initialize new objects to add them to a list and at the end, i do a bulk insert into the database of the list. 我遇到的问题是我需要初始化新对象以将它们添加到列表中,最后,我将大量插入列表中的数据库中。 Because i am initializing a huge number of objects, my computer memory(RAM) gets filled up and it kinda freezes everything. 因为我要初始化大量的对象,所以我的计算机内存(RAM)装满了,它冻结了所有内容。 The question is : From a memory point of view, should I initialize objects of set them to null ? 问题是:从内存角度来看,我应该初始化将其设置为null的对象吗? Also, I am trying to work with the same object reference. 另外,我正在尝试使用相同的对象引用。 Am i doing it right ? 我做对了吗?

Code: 码:

QACompleted completed = new QACompleted();
QAUncompleted uncompleted = new QAUncompleted();
QAText replaced = new QAText();

foreach (QAText question in questions)
{
    MatchCollection matchesQ = rgx.Matches(question.Question);
    MatchCollection matchesA = rgx.Matches(question.Answer);

    foreach (GetKeyValues_Result item in values)
    {

        hasNull = false;
        replaced = new QAText();  <- this object

        if (matchesQ.Count > 0)
        {
            SetQuestion(matchesQ, replaced, question, item);
        }
        else
        {
            replaced.Question = question.Question;
        }

        if (matchesA.Count > 0)
        {
            SetAnswer(matchesA,replaced,question,item);
        }
        else
        {
            replaced.Answer = question.Answer;
        }

        if (!hasNull)
        {
            if (matchesA.Count == 0 && matchesQ.Count == 0)
            {
                completed = new QACompleted();    <- this object
                MapEmpty(replaced,completed, question.Id);

            }
            else
            {
                completed = new QACompleted();  <- this object
                MapCompleted(replaced, completed, question.Id, item);
            }


            goodResults.Add(completed);
        }
        else
        {
            uncompleted = new QAUncompleted();     <- this object
            MapUncompleted(replaced,uncompleted,item, question.Id);

            badResults.Add(uncompleted);
        }
    }
    var success = InsertIntoDataBase(goodResults, "QACompleted");
    var success1 = InsertIntoDataBase(badResults, "QAUncompleted");
}

I have marked the objects. 我已经标记了对象。 Should I just call them like replaced = NULL, or should i use the constructor ? 我应该像替换= NULL那样称呼它们,还是应该使用构造函数? What would be the difference between new QAText() and = null ? new QAText()和= null有什么区别?

Instantiating new (albeit empty) object always takes some memory, as it has to allocate space for the object's fields. 实例化新的对象(尽管是空的)总是要占用一些内存,因为它必须为对象的字段分配空间。 If you aren't going to access or set any data in the instance, I see no point in creating it. 如果您不打算在实例中访问或设置任何数据,那么创建它毫无意义。

It's unfortunate that the code example is not written better. 不幸的是,该代码示例的编写效果不佳。 There seem to be lots of declarations left out, and undocumented side-effects in the code. 似乎遗漏了很多声明,并且代码中没有未记录的副作用。 This makes it very hard to offer specific advice. 这使得很难提供具体的建议。

That said… 那说...

Your replaced object does not appear to be retained beyond one iteration of the loop, so it's not part of the problem. replaced对象似乎不会在循环的一次迭代之后被保留,因此这不是问题的一部分。 The completed and uncompleted objects are added to lists, so they do add to your memory consumption. completeduncompleted对象将添加到列表中,因此它们确实会增加您的内存消耗。 Likewise the goodResults and badResults lists themselves (where are the declarations for those?). 同样, goodResultsbadResults列出自身(这些声明在哪里?)。

If you are using a computer with too little RAM, then yes...you'll run into performance issues as Windows uses the disk to make up for the lack of RAM. 如果您使用的计算机的RAM太少,那么是的……您会遇到性能问题,因为Windows使用磁盘来弥补RAM的不足。 And even with enough RAM, at some point you could run into .NET's limitations with respect to object size (ie you can only put so many elements into a list). 即使有足够的RAM,有时在对象大小方面您也可能遇到.NET的限制(即,您只能将这么多的元素放入列表中)。 So one way or the other, you seem to need to reduce your peak memory usage. 因此,您似乎需要减少峰值内存使用量。

You stated that when the data in the lists is inserted into the database, the lists are cleared. 您说过,当列表中的数据插入数据库时​​,将清除列表。 So presumably that means that there are so many elements in the values list (one of the undeclared, undocumented variables in your code example) that the lists and their objects get too large before getting to the end of the inner loop and inserting the data into the database. 因此,大概意味着这意味着values列表中有太多元素(代码示例中未声明的,未记录的变量之一),使得列表及其对象在到达内部循环的末尾并将数据插入到其中之前变得过大。数据库。

In that case, then it seems likely the simplest way to address the issue is to submit the updates in batches inside the inner foreach loop. 在这种情况下,那么它很可能解决这个问题最简单的方法是分批提交内更新foreach循环。 Eg at the end of that loop, add something like this: 例如,在循环结束时,添加如下内容:

if (goodResults.Count >= 100000)
{
    var success = InsertIntoDataBase(goodResults, "QACompleted");
}

if (badResults.Count >= 100000)
{
    var success = InsertIntoDataBase(badResults, "QACompleted");
}

(Declaring the actual cut-off as a named constant of course, and handling the database insert result return value as appropriate). (当然,将实际截止值声明为命名常量,并适当地处理数据库插入结果的返回值)。

Of course, you would still do the insert at the end of the outer loop too. 当然,您仍然可以在外循环的末尾进行插入。

The memory cost of creating objects 创建对象的内存成本

Creating objects in C# will always have a memory cost. 在C#中创建对象始终会产生内存消耗。 This relates to the memory layout of object. 这与对象的内存布局有关。 Assuming you are using 64 bit OS, the runtime has to allocate an extra 8 bytes for sync block, and 8 bytes for method table pointer. 假设您使用的是64位OS,则运行时必须为同步块分配额外的8个字节,为方法表指针分配8个字节。 After the sync block and method table pointer are your customized data fields. 同步块和方法表指针之后是您自定义的数据字段。 Besides the inevitable 16 bytes header, objects are always aligned to the boundary of 8 bytes and therefore can incur extra overhead. 除了不可避免的16个字节的标头之外,对象始终与8个字节的边界对齐,因此会产生额外的开销。

You can roughly estimate the memory overhead if you know exactly what is the number of objects you create. 如果您确切知道所创建的对象数,则可以大致估算出内存开销。 However I would suggest you be careful when assuming that your memory pressure is coming from object layout overhead. 但是,我建议您在假设内存压力来自对象布局开销时要小心。 This is also the reason I suggest you estimate the overhead as the first step. 这也是我建议您将开销估算为第一步的原因。 You might end up realizing that even if the layout overhead can magically be completely removed, you are not going to make a huge difference in terms of memory performance. 您可能最终意识到,即使可以神奇地完全消除布局开销,也不会在内存性能上产生巨大的变化。 After all, for a million objects, the overhead of object header is only 16 MB. 毕竟,对于一百万个对象,对象标头的开销仅为16 MB。

The difference between replaced = new QAText() and replaced = null 替换=新QAText()和替换= null之间的区别

I suppose after you set replaced to null you still have to create another QAText()? 我想在将replace设置为null之后,您仍然必须创建另一个QAText()吗? If so, memory-wise there is no real difference to the garbage collector. 如果是这样,则从内存角度看,垃圾回收器没有真正的区别。 The old QAText instance will be collected either way if you are not making any other reference to it. 如果您没有对旧的QAText实例进行任何其他引用,则将以任何一种方式来收集它。 When to collect the instance, however, is the call of garbage collector. 但是,何时收集实例是垃圾收集器的调用。 Doing replaced = null will not make the GC happen earlier. 进行replace = null不会使GC更早发生。

You can try to reuse the same QAText instance instead of creating a new one every time. 您可以尝试重用相同的QAText实例,而不是每次都创建一个新实例。 But creating a new one every time will not result in high memory pressure. 但是,每次创建一个新的内存都不会增加内存压力。 It will make the GC a little busier therefore result in a higher CPU usage. 这会使GC更加繁忙,因此会导致更高的CPU使用率。

Identify the real cause for high memory usage 确定导致高内存使用的真正原因

If your application is really using a lot of memory, you have to look at the design of your QACompleted and QAUncompleted objects. 如果您的应用程序确实在使用大量内存,则必须查看QACompleted和QAUncompleted对象的设计。 Those are the objects added to the list and occupy memory until you submit them to the database. 这些对象已添加到列表中并占据内存,直到您将它们提交到数据库为止。 If those objects are designed well(they are only taking the memory they have to take), as Peter pointed out you should use a smaller batch size so you don't have to keep too many of them in memory. 如果这些对象设计得很好(它们仅占用它们必须占用的内存),正如Peter指出的那样,您应该使用较小的批处理大小,因此不必在内存中保留太多的对象。

There are other factors in your program that can possible cause unexpected memory usage. 程序中还有其他因素,可能会导致意外的内存使用。 What is the data structure for goodResults and badResults? goodResults和badResults的数据结构是什么? Are they List or LinkedList? 它们是List还是LinkedList? List internally is nothing but a dynamic array. 内部列表只是动态数组。 It uses a grow policy which will always double its size when it is full. 它使用增长策略,该策略将在存储满时始终增加一倍。 The always-double policy can eat up memory quickly especially when you have a lot of entries. 始终双倍策略可能会很快耗尽内存,尤其是当您有很多条目时。

LinkedList, on the other side, does not suffer from the above-mentioned problem. 另一方面,LinkedList没有上述问题。 But every single node requires roughly 40 extra bytes. 但是每个节点大约需要40个额外的字节。

It also worth-checking what MapCompleted and MapUnCompleted methods are doing. 它还值得检查MapCompleted和MapUnCompleted方法在做什么。 Are they making long-lived reference to replaced object? 他们是否在长期参考replaced对象? If so it will cause a memory leak. 如果是这样,将导致内存泄漏。

As a summary, when dealing with memory problems, you should focus on macro-scope issues such as the choice of data structures, or memory leaks. 总而言之,在处理内存问题时,您应该专注于宏观问题,例如数据结构的选择或内存泄漏。 Or optimize your algorithms so that you don't have to keep all the data in memory all the time. 或优化算法,以使您不必一直将所有数据保留在内存中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM