简体   繁体   English

在 Python 中丢弃图像时内存泄漏

[英]Memory leaks when image discarded in Python

I'm currently writing a simple board game in Python and I just realized that garbage collection doesn't purge the discarded bitmap data from memory when images are reloaded.我目前正在用 Python 编写一个简单的棋盘游戏,我刚刚意识到垃圾收集不会在重新加载图像时从内存中清除丢弃的位图数据。 It happens only when game is started or loaded or the resolution changes but it multiples the memory consumed so I can't let this problem unsolved.它仅在游戏启动或加载或分辨率改变时发生,但它消耗的内存成倍增加,所以我不能让这个问题悬而未决。

When images are reloaded all references are transferred to the new image data since it is binded to the same variable as the original image data was binded to.重新加载图像时,所有引用都将转移到新图像数据,因为它绑定到与原始图像数据绑定到的相同变量。 I tried to force the garbage collection by using collect() but it didn't help.我试图通过使用collect()来强制垃圾收集,但它没有帮助。

I wrote a small sample to demonstrate my problem.我写了一个小样本来演示我的问题。

from tkinter import Button, DISABLED, Frame, Label, NORMAL, Tk
from PIL.Image import open
from PIL.ImageTk import PhotoImage

class App(Tk):
    def __init__(self):
        Tk.__init__(self)
        self.text = Label(self, text = "Please check the memory usage. Then push button #1.")
        self.text.pack()
        self.btn = Button(text = "#1", command = lambda : self.buttonPushed(1))
        self.btn.pack()

    def buttonPushed(self, n):
        "Cycle to open the Tab module n times."
        self.btn.configure(state = DISABLED) # disable to prevent paralell cycles
        if n == 100:
            self.text.configure(text = "Overwriting the bitmap with itself 100 times...\n\nCheck the memory usage!\n\nUI may seem to hang but it will finish soon.")
            self.update_idletasks()
        for i in range(n):      # creates the Tab frame whith the img, destroys it, then recreates them to overwrite the previous Frame and prevous img
            b = Tab(self)
            b.destroy()
            if n == 100:
                print(i+1,"percent of processing finished.")
        if n == 1:
            self.text.configure(text = "Please check the memory usage now.\nMost of the difference is caused by the bitmap opened.\nNow push button #100.")
            self.btn.configure(text = "#100", command = lambda : self.buttonPushed(100))
        self.btn.configure(state = NORMAL)  # starting cycles is enabled again       

class Tab(Frame):
    """Creates a frame with a picture in it."""
    def __init__(self, master):
        Frame.__init__(self, master = master)
        self.a = PhotoImage(open("map.png"))    # img opened, change this to a valid one to test it
        self.b = Label(self, image = self.a)
        self.b.pack()                           # Label with img appears in Frame
        self.pack()                             # Frame appears

if __name__ == '__main__':
    a = App()

To run the code above you will need a PNG image file.要运行上面的代码,您需要一个 PNG 图像文件。 My map.png's dimensions are 1062×1062.我的 map.png 的尺寸是 1062×1062。 As a PNG it is 1.51 MB and as bitmap data it is about 3-3.5 MB.作为 PNG,它是 1.51 MB,作为位图数据,它大约是 3-3.5 MB。 Use a large image to see the memory leak easily.使用大图像可以轻松查看内存泄漏。

Expected result when you run my code: python's process eats up memory cycle by cycle.运行我的代码时的预期结果:python 的进程一个周期一个周期地吃掉内存。 When it consumes approximately 500 MB it collapses but starts to eat up the memory again.当它消耗大约 500 MB 时,它会崩溃但又开始吃掉内存。

Please give me some advice how to solve this issue.请给我一些如何解决这个问题的建议。 I'm grateful for every help.我很感激每一个帮助。 Thank you.谢谢你。 in advance.提前。

First, you definitely do not have a memory leak.首先,你绝对没有内存泄漏。 If it "collapses" whenever it gets near 500MB and never crosses it, it can't possibly be leaking.如果它在接近 500MB 时“崩溃”并且从未越过它,则它不可能泄漏。


And my guess is that you don't have any problem at all.我的猜测是你根本没有任何问题。

When Python's garbage collector cleans things up (which generally happens immediately when you're done with it in CPython), it generally doesn't actually release the memory to the OS.当 Python 的垃圾收集器清理内容时(通常在您在 CPython 中完成它时会立即发生),它通常实际上不会将内存释放给操作系统。 Instead, it keeps it around in case you need it later.相反,它会保留它,以防您以后需要它。 This is intentional—unless you're thrashing swap, it's a whole lot faster to reuse memory than to keep freeing and reallocating it.这是有意为之——除非你正在颠簸交换,否则重用内存比不断释放和重新分配内存要快得多。

Also, if 500MB is virtual memory, that's nothing on a modern 64-bit platform.此外,如果 500MB 是虚拟内存,那么这在现代 64 位平台上算不了什么。 If it's not mapped to physical/resident memory (or is mapped if the computer is idle, but quickly tossed otherwise), it's not a problem;如果它没有映射到物理/常驻内存(或者在计算机空闲时映射,否则很快就扔了),这不是问题; it's just the OS being nice with resources that are effectively free.只是操作系统对有效免费的资源很好。

More importantly: What makes you think there's a problem?更重要的是:是什么让您认为有问题? Is there any actual symptom, or just something in Program Manager/Activity Monitor/top/whatever that scares you?是否有任何实际症状,或者只是程序管理器/活动监视器/顶部/任何让您感到害怕的东西? (If the latter, take a look at the of the other programs. On my Mac, I've got 28 programs currently running using over 400MB of virtual memory, and I'm using 11 out of 16GB, even though less than 3GB is actually wired. If I, say, fire up Logic, the memory will be collected faster than Logic can use it; until then, why should the OS waste effort unmapping memory (especially when it has no way to be sure some processes won't go ask for that memory it wasn't using later)? (如果是后者,请查看其他程序的实际上是连线的。如果我,比如说,启动 Logic,内存的收集速度将比 Logic 可以使用它的速度快;在那之前,为什么操作系统要浪费精力取消映射内存(尤其是当它无法确保某些进程不会去询问它以后没有使用的内存)?


But if there is a real problem, there are two ways to solve it.但如果一个真正的问题,有两种方法可以解决这个问题。


The first trick is to do everything memory-intensive in a child process that you can kill and restart to recover the temporary memory (eg, by using multiprocessing.Process or concurrent.futures.ProcessPoolExecutor ).第一个技巧是在子进程中执行所有内存密集型操作,您可以杀死并重新启动以恢复临时内存(例如,通过使用multiprocessing.Processconcurrent.futures.ProcessPoolExecutor )。

This usually makes things slower rather than faster.这通常会使事情变得更慢而不是更快。 And it's obviously not easy to do when the temporary memory is mostly things that go right into the GUI, and therefore have to live in the main process.当临时内存主要是直接进入 GUI 的东西时,这显然不容易做到,因此必须存在于主进程中。


The other option is to figure out where the memory's being used and not keep so many objects around at the same time.另一种选择是找出内存的使用位置,而不是同时保留这么多对象。 Basically, there are two parts to this:基本上,这有两个部分:

First, release everything possible before the end of each event handler.首先,在每个事件处理程序结束之前释放所有可能的东西。 This means calling close on files, either del ing objects or setting all references to them to None , calling destroy on GUI objects that aren't visible, and, most of all, not storing references to things you don't need.这意味着调用close文件,要么del对象,要么将所有对它们的引用设置为None ,对不可见的 GUI 对象调用destroy ,最重要的是,不存储对不需要的东西的引用。 (Do you actually need to keep the PhotoImage around after you use it? If you do, is there any way you can load the images on demand?) (你真的需要在使用后保留PhotoImage吗?如果你这样做了,有什么方法可以按需加载图像吗?)

Next, make sure you have no reference cycles.接下来,确保您没有参考循环。 In CPython, garbage is cleaned up immediately as long as there are no cycles—but if there are, they sit around until the cycle checker runs.在 CPython 中,只要没有循环,就会立即清除垃圾——但如果有,它们就会一直等待,直到循环检查器运行。 You can use the gc module to investigate this.您可以使用gc模块对此进行调查。 One really quick thing to do is try this every so often:一件非常快速的事情就是经常尝试这个:

print(gc.get_count())
gc.collect()
print(gc.get_count())

If you see huge drops, you've got cycles.如果你看到大幅下降,你就有了周期。 You'll have to look inside gc.getobjects() and gc.garbage , or attach callbacks, or just reason about your code to find exactly where the cycles are.您必须查看gc.getobjects()gc.garbage ,或者附加回调,或者只是对代码进行推理以准确找到循环的位置。 For each one, if you don't really need references in both directions, get rid of one;对于每一个,如果你真的不需要两个方向的引用,去掉一个; if you do, change one of them into a weakref .如果这样做,请将其中一个更改为weakref

Saving 500MB is worth, saving 100MB is worth, saving 10MB is worth.节省 500MB 是值得的,节省 100MB 是值得的,节省 10MB 是值得的。 Memory has price of gold and many suggests to waste it.内存有金价,许多人建议浪费它。 Definitely, it is your decision, if you want to waste it on your Mac, do it... And absolutely, it is very sad advice how to write very poor software.当然,这是你的决定,如果你想把它浪费在你的 Mac 上,那就去做吧……而且绝对是,如何编写非常糟糕的软件是非常可悲的建议。

Use https://pypi.org/project/memory-profiler/ to track your Python memory allocations.使用https://pypi.org/project/memory-profiler/跟踪 Python 内存分配。 Use

x = someRamConsumingObject()
# do the stuff here ...
# remove the refrences
del x
x = None
gc.Collect() # try to force garbage collector to collect

Away from philosophical discussions, real examples from industrial Edge computing gives us exact reasons why this shall be improved.抛开哲学讨论,工业边缘计算的真实例子为我们提供了改进这一点的确切原因。 If running Python in containers one will soon hit the wall, especially having multiple containers running on the Edge under heavy production load.如果在容器中运行 Python,很快就会遇到瓶颈,尤其是在生产负载繁重的情况下在 Edge 上运行多个容器。

And even if Edge has 16GiB, you will hit wall soon, especially using data analytics tools like Pandas.即使 Edge 有 16GiB,你也很快就会碰壁,尤其是使用 Pandas 等数据分析工具。

Then, my friend, you will recognise what is the hell of garbage collectors and what means "not having memory under control".然后,我的朋友,你会明白什么是垃圾收集器,什么是“无法控制内存”。

C++ rocks!!! C++ 摇滚!!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM