简体   繁体   English

防止 TextIOWrapper 以 Py2/Py3 兼容的方式关闭 GC

[英]Prevent TextIOWrapper from closing on GC in a Py2/Py3 compatible way

What I need to accomplish:我需要完成的事情:

Given a binary file, decode it in a couple different ways providing a TextIOBase API.给定一个二进制文件,以提供TextIOBase API 的几种不同方式对其进行解码。 Ideally these subsequent files can get passed on without my needing to keep track of their lifespan explicitly.理想情况下,这些后续文件可以在不需要我明确跟踪它们的生命周期的情况下传递。

Unfortunately, wrapping a BufferedReader will result in that reader being closed when the TextIOWrapper goes out of scope.不幸的是,当TextIOWrapper超出范围时,包装BufferedReader将导致该读取器被关闭。

Here is a simple demo of this:这是一个简单的演示:

In [1]: import io

In [2]: def mangle(x):
   ...:     io.TextIOWrapper(x) # Will get GCed causing __del__ to call close
   ...:     

In [3]: f = io.open('example', mode='rb')

In [4]: f.closed
Out[4]: False

In [5]: mangle(f)

In [6]: f.closed
Out[6]: True

I can fix this in Python 3 by overriding __del__ (this is a reasonable solution for my use case as I have complete control over the decoding process, I just need to expose a very uniform API at the end):我可以通过覆盖__del__在 Python 3 中解决这个问题(这对于我的用例来说是一个合理的解决方案,因为我可以完全控制解码过程,我只需要在最后公开一个非常统一的 API):

In [1]: import io

In [2]: class MyTextIOWrapper(io.TextIOWrapper):
   ...:     def __del__(self):
   ...:         print("I've been GC'ed")
   ...:         

In [3]: def mangle2(x):
   ...:     MyTextIOWrapper(x)
   ...:     

In [4]: f2 = io.open('example', mode='rb')

In [5]: f2.closed
Out[5]: False

In [6]: mangle2(f2)
I've been GC'ed

In [7]: f2.closed
Out[7]: False

However this does not work in Python 2:但是,这在 Python 2 中不起作用:

In [7]: class MyTextIOWrapper(io.TextIOWrapper):
   ...:     def __del__(self):
   ...:         print("I've been GC'ed")
   ...:         

In [8]: def mangle2(x):
   ...:     MyTextIOWrapper(x)
   ...:     

In [9]: f2 = io.open('example', mode='rb')

In [10]: f2.closed
Out[10]: False

In [11]: mangle2(f2)
I've been GC'ed

In [12]: f2.closed
Out[12]: True

I've spent a bit of time staring at the Python source code and it looks remarkably similar between 2.7 and 3.4 so I don't understand why the __del__ inherited from IOBase is not overridable in Python 2 (or even visible in dir ), but still seems to get executed.我花了一点时间盯着在Python源代码,所以我不明白为什么它看起来2.7和3.4之间非常相似__del__继承自IOBase是不可重写在Python 2(甚至在可见的dir ),但似乎仍然被执行。 Python 3 works exactly as expected. Python 3 完全按预期工作。

Is there anything I can do?有什么我可以做的吗?

Just detach your TextIOWrapper() object before letting it be garbage collected:在让它被垃圾收集之前,只需分离你的TextIOWrapper()对象:

def mangle(x):
    wrapper = io.TextIOWrapper(x)
    wrapper.detach()

The TextIOWrapper() object only closes streams it is attached to. TextIOWrapper()对象只关闭它所附加的流。 If you can't alter the code where the object goes out of scope, then simply keep a reference to the TextIOWrapper() object locally and detach at that point.如果您无法更改对象超出范围的代码,则只需在本地保留对TextIOWrapper()对象的引用并在该点分离。

If you must subclass TextIOWrapper() , then just call detach() in the __del__ hook:如果您必须TextIOWrapper() ,那么只需在__del__钩子中调用detach()

class DetachingTextIOWrapper(io.TextIOWrapper):
    def __del__(self):
        self.detach()

EDIT:编辑:

Just call detach first, thanks martijn-pieters!先调用detach ,谢谢 martijn-pieters!


It turns out there is basically nothing that can be done about the deconstructor calling close in Python 2.7.事实证明,在 Python 2.7 中调用close的解构函数基本上无能为力。 This is hardcoded into the C code.这被硬编码到 C 代码中。 Instead we can modify close such that it won't close the buffer when __del__ is happening ( __del__ will be executed before _PyIOBase_finalize in the C code giving us a chance to change the behaviour of close ).相反,我们可以通过修改close ,使得它不会关闭时缓冲__del__正在发生的事情( __del__会前执行_PyIOBase_finalize在C代码给我们一个机会来改变行为close )。 This lets close work as expected without letting the GC close the buffer.这可以按预期close工作,而不会让 GC 关闭缓冲区。

class SaneTextIOWrapper(io.TextIOWrapper):
    def __init__(self, *args, **kwargs):
        self._should_close_buffer = True
        super(SaneTextIOWrapper, self).__init__(*args, **kwargs)

    def __del__(self):
        # Accept the inevitability of the buffer being closed by the destructor
        # because of this line in Python 2.7:
        # https://github.com/python/cpython/blob/2.7/Modules/_io/iobase.c#L221
        self._should_close_buffer = False
        self.close()  # Actually close for Python 3 because it is an override.
                      # We can't call super because Python 2 doesn't actually
                      # have a `__del__` method for IOBase (hence this
                      # workaround). Close is idempotent so it won't matter
                      # that Python 2 will end up calling this twice

    def close(self):
        # We can't stop Python 2.7 from calling close in the deconstructor
        # so instead we can prevent the buffer from being closed with a flag.

        # Based on:
        # https://github.com/python/cpython/blob/2.7/Lib/_pyio.py#L1586
        # https://github.com/python/cpython/blob/3.4/Lib/_pyio.py#L1615
        if self.buffer is not None and not self.closed:
            try:
                self.flush()
            finally:
                if self._should_close_buffer:
                    self.buffer.close()

My previous solution here used _pyio.TextIOWrapper which is slower than the above because it is written in Python, not C.我之前的解决方案使用了_pyio.TextIOWrapper ,它比上面的要慢,因为它是用 Python 编写的,而不是用 C 编写的。

It involved simply overriding __del__ with a noop which will also work in Py2/3.它涉及简单地使用 noop 覆盖__del__ ,该 noop 也适用于 Py2/3。

A simple solution would be to return the variable from the function and store it in script scope, so that it does not get garbage collected until the script ends or the reference to it changes.一个简单的解决方案是从函数返回变量并将其存储在脚本范围内,以便在脚本结束或对它的引用发生变化之前不会被垃圾收集。 But there may be other elegant solutions out there.但可能还有其他优雅的解决方案。

EDIT:编辑:

I found a much better solution (comparatively), but I will leave this answer in the event it is useful for anyone to learn from.我找到了一个更好的解决方案(相对而言),但如果它对任何人都有用,我会留下这个答案。 (It is a pretty easy way to show off gc.garbage ) (这是炫耀gc.garbage一种非常简单的方法)

Please do not actually use what follows.请不要实际使用以下内容。

OLD:老的:

I found a potential solution, though it is horrible:我找到了一个潜在的解决方案,虽然它很可怕:

What we can do is set up a cyclic reference in the destructor, which will hold off the GC event.我们可以做的是在析构函数中设置一个循环引用,这将阻止 GC 事件。 We can then look at the garbage of gc to find these unreferenceable objects, break the cycle, and drop that reference.然后我们可以查看gcgarbage以找到这些不可引用的对象,打破循环,并删除该引用。

In [1]: import io

In [2]: class MyTextIOWrapper(io.TextIOWrapper):
   ...:     def __del__(self):
   ...:         if not hasattr(self, '_cycle'):
   ...:             print "holding off GC"
   ...:             self._cycle = self
   ...:         else:
   ...:             print "getting GCed!"
   ...:

In [3]: def mangle(x):
   ...:     MyTextIOWrapper(x)
   ...:     

In [4]: f = io.open('example', mode='rb')

In [5]: mangle(f)
holding off GC

In [6]: f.closed
Out[6]: False

In [7]: import gc

In [8]: gc.garbage
Out[8]: []

In [9]: gc.collect()
Out[9]: 34

In [10]: gc.garbage
Out[10]: [<_io.TextIOWrapper name='example' encoding='UTF-8'>]

In [11]: gc.garbage[0]._cycle=False

In [12]: del gc.garbage[0]
getting GCed!

In [13]: f.closed
Out[13]: True

Truthfully this is a pretty horrific workaround, but it could be transparent to the API I am delivering.说实话,这是一个非常可怕的解决方法,但它可能对我提供的 API 是透明的。 Still I would prefer a way to override the __del__ of IOBase .尽管如此,我宁愿一个方法来覆盖__del__IOBase

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM