简体   繁体   English

了解python引用计数以调试c-扩展

[英]understanding python reference counts in order to debug c-extensions

I am writing a c-extension and want to test it in pytest. 我正在编写一个c扩展名,并想在pytest中对其进行测试。

Part of what I am testing is whether the reference counts are correct on my objects. 我要测试的部分内容是对象上的引用计数是否正确。 Thus I build a small test in pure python that puzzles me... 因此,我用纯python构建了一个小测试,这使我感到困惑。

from Ipython i get: 从Ipython我得到:

In [1]: x = 153513514215

In [2]: import sys

In [3]: sys.getrefcount(x)
Out[3]: 2

So fare so good, 1 reference from assignment and 1 from the caller. 票价如此好,分配给了1个参考,呼叫者给了1个。

However the following script (stackoverflow_test.py) gives the following results 但是,以下脚本(stackoverflow_test.py)给出以下结果

import sys

def test_ref_count_int():
    x = 677461248192962146784178
    assert sys.getrefcount(x) == 2

def test_ref_count_str():
    y = 'very long and probbably very unique string'
    assert sys.getrefcount(y) == 2

def normal_te_st():
    x = 222677461248192962146784178
    y = '!!!!very long and probbably unique string!!!!'
    print ('x refcount = {}'.format(sys.getrefcount(x)))
    print ('y refcount = {}'.format(sys.getrefcount(y)))

if __name__ == '__main__':
    normal_te_st()

when I run it as a normal python script 当我将其作为普通的python脚本运行时

$ python3 stackoverflow_test.py
x refcount = 4
y refcount = 4

Why 4 and not 2?. 为什么是4,而不是2?

When I run it with pytest 当我用pytest运行它时

$ python3 -m pytest stackoverflow_test.py
=================== test session starts ===================
platform linux -- Python 3.4.3, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /opt/projects/0001_Intomics/00005_TextMining/jcr/textmining/tests, inifile:
collected 2 items

stackoverflow_test.py FF

======================== FAILURES =========================
___________________ test_ref_count_int ____________________

    def test_ref_count_int():
        x = 677461248192962146784178
>       assert sys.getrefcount(x) == 2
E       assert 3 == 2
E        +  where 3 = <built-in function getrefcount>(677461248192962146784178)
E        +    where <built-in function getrefcount> = sys.getrefcount

stackoverflow_test.py:7: AssertionError
___________________ test_ref_count_str ____________________

    def test_ref_count_str():
        y = 'very long and probbably very unique string'
>       assert sys.getrefcount(y) == 2
E       AssertionError: assert 3 == 2
E        +  where 3 = <built-in function getrefcount>('very long and probbably very unique string')
E        +    where <built-in function getrefcount> = sys.getrefcount

stackoverflow_test.py:11: AssertionError

Why 3 and not 2? 为什么是3而不是2?

Question: How come that 问题:怎么回事

  • python = 4 ref counts python = 4引用计数
  • pytest = 3 ref counts pytest = 3个引用计数
  • ipython session = 2 ref counts ipython会话= 2引用计数

I would expect it to behave like in ipython in all 3 cases, can anybody explain what is going on, and give me some hints how to best test the objects I am creating. 我希望它在所有3种情况下的表现都像在ipython中一样,任何人都可以解释发生了什么,并给我一些提示,以最好地测试我正在创建的对象。

Literals in your code are stored in a code object. 代码中的文字存储在代码对象中。 The bytecode stack is another reference: 字节码栈是另一个参考:

>>> import dis
>>> def normal_te_st():
...     x = 222677461248192962146784178
...     y = '!!!!very long and probbably unique string!!!!'
...     print ('x refcount = {}'.format(sys.getrefcount(x)))
...     print ('y refcount = {}'.format(sys.getrefcount(y)))
...
>>> normal_te_st.__code__.co_consts
(None, 222677461248192962146784178, '!!!!very long and probbably unique string!!!!', 'x refcount = {}', 'y refcount = {}')
>>> dis.dis(normal_te_st)
  2           0 LOAD_CONST               1 (222677461248192962146784178)
              2 STORE_FAST               0 (x)

  3           4 LOAD_CONST               2 ('!!!!very long and probbably unique string!!!!')
              6 STORE_FAST               1 (y)

  4           8 LOAD_GLOBAL              0 (print)
             10 LOAD_CONST               3 ('x refcount = {}')
             12 LOAD_ATTR                1 (format)
             14 LOAD_GLOBAL              2 (sys)
             16 LOAD_ATTR                3 (getrefcount)
             18 LOAD_FAST                0 (x)
             20 CALL_FUNCTION            1
             22 CALL_FUNCTION            1
             24 CALL_FUNCTION            1
             26 POP_TOP

  5          28 LOAD_GLOBAL              0 (print)
             30 LOAD_CONST               4 ('y refcount = {}')
             32 LOAD_ATTR                1 (format)
             34 LOAD_GLOBAL              2 (sys)
             36 LOAD_ATTR                3 (getrefcount)
             38 LOAD_FAST                1 (y)
             40 CALL_FUNCTION            1
             42 CALL_FUNCTION            1
             44 CALL_FUNCTION            1
             46 POP_TOP
             48 LOAD_CONST               0 (None)
             50 RETURN_VALUE

The LOAD_CONST opcodes load the object from the co_consts tuple attached to the code object; LOAD_CONST操作码从附加到代码对象的co_consts元组中加载对象。 that tuple is one reference. 该元组是一个参考。 STORE_FAST then puts that into a local variable, that's the second reference. 然后, STORE_FAST将其放入局部变量中,这是第二个引用。

Then there's the LOAD_FAST opcode, this takes a name from local storage and puts it on the stack, again incrementing the reference count . 然后是LOAD_FAST操作码,它从本地存储中获取一个名称,并将其放入堆栈中, 再次增加引用计数

Last but not least, you pass that value to the sys.getrefcount() call. 最后但并非最不重要的一点是,您将该值传递给sys.getrefcount()调用。

If you want to learn about what references your objects, you may want to look at gc.get_referrers() ; 如果您想了解什么引用了您的对象,则可能需要看一下gc.get_referrers() ; this function excludes itself and the stack when called, so you can mentally add +2: 此函数在调用时会排除自身和堆栈,因此您可以在头脑中加上+2:

>>> import gc
>>> def gc_demo():
...     x = 222677461248192962146784178
...     print(gc.get_referrers(x))
...
>>> gc_demo()
[(None, 222677461248192962146784178), <frame object at 0x106a25a98>]

That prints 2 objects; 打印两个对象; the co_consts tuple, and the current call frame (for the locals). co_consts元组和当前调用框架(对于本地用户)。

py.test does some additional import-time magic which rewrites assert statements , and as a result the reference count is different again. py.test一些其他的导入时魔术 ,该魔术 重写了assert语句 ,结果引用计数再次不同。

You may also want to read the Reference Counts section of the Extending Python with C or C++ documentation, the Objects, Types and Reference Counts section of the C API Reference Manual , and last but not least the Debugging Builds section of the same same, to learn how to create a Python build that helps you trace reference counts in detail. 您可能还需要阅读引用计数部分扩展PythonC或C ++文档中, 对象,类型和引用计数部分C API参考手册 ,以及最后但并非最不重要的调试Builds部分同相同的,以了解如何创建Python构建,该构建可帮助您详细跟踪引用计数。

You should never rely on a specific number of references to an object. 您永远不应依赖于对对象的特定数量的引用。 I can trivially add more references to you objects by reaching into the function object, for example ( foo = normal_te_st.__code__.co_conts[1] would increment the reference count before even running the function). 我可以通过进入函数对象来为您的对象添加更多引用,例如( foo = normal_te_st.__code__.co_conts[1]甚至在运行函数之前都会增加引用计数)。 What exactly requires the reference count to go up is an implementation detail. 确切需要引用计数增加的是实现细节。 Just make sure your own code handles references correctly. 只要确保您自己的代码正确处理引用即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM