简体   繁体   English

我如何理解 .pyc 文件内容

[英]How can I understand a .pyc file content

I have a .pyc file.我有一个.pyc文件。 I need to understand the content of that file to know how the disassembler works of python, ie how can I generate a output like dis.dis(function) from .pyc file content.我需要了解该文件的内容以了解反汇编程序如何在 python 中工作,即如何从.pyc文件内容生成类似dis.dis(function)的输出。

for eg例如

>>> def sqr(x):  
...     return x*x
...
>>> import dis
>>> dis.dis(sqr)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                0 (x)
              6 BINARY_MULTIPLY     
              7 RETURN_VALUE        

I need to get a output like this using the .pyc file.我需要使用.pyc文件获得这样的输出。

.pyc files contain some metadata and a marshal ed code object; .pyc文件包含一些元数据和一个marshal code对象; to load the code object and disassemble that use:加载code对象并反汇编使用:

import dis, marshal, sys

header_sizes = [
    # (size, first version this applies to)
    # pyc files were introduced in 0.9.2 way, way back in June 1991.
    (8,  (0, 9, 2)),  # 2 bytes magic number, \r\n, 4 bytes UNIX timestamp
    (12, (3, 6)),     # added 4 bytes file size
    # bytes 4-8 are flags, meaning of 9-16 depends on what flags are set
    # bit 0 not set: 9-12 timestamp, 13-16 file size
    # bit 0 set: 9-16 file hash (SipHash-2-4, k0 = 4 bytes of the file, k1 = 0)
    (16, (3, 7)),     # inserted 4 bytes bit flag field at 4-8 
    # future version may add more bytes still, at which point we can extend
    # this table. It is correct for Python versions up to 3.9
]
header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)

with open(pycfile, "rb") as f:
    metadata = f.read(header_size)  # first header_size bytes are metadata
    code = marshal.load(f)          # rest is a marshalled code object

dis.dis(code)

Demo with the bisect module:使用bisect模块进行演示:

>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_sizes = [(8, (0, 9, 2)), (12, (3, 6)), (16, (3, 7))]
>>> header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)
>>> pycfile = getattr(bisect, '__cached__', pycfile.__file__)
>>> with open(pycfile, "rb") as f:
...     metadata = f.read(header_size)  # first header_size bytes are metadata
...     code = marshal.load(f)          # rest is bytecode
... 
>>> dis.dis(code)
  1           0 LOAD_CONST               0 ('Bisection algorithms.')
              2 STORE_NAME               0 (__doc__)

  3           4 LOAD_CONST              12 ((0, None))
              6 LOAD_CONST               3 (<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>)
              8 LOAD_CONST               4 ('insort_right')
             10 MAKE_FUNCTION            1 (defaults)
             12 STORE_NAME               1 (insort_right)

 15          14 LOAD_CONST              13 ((0, None))
             16 LOAD_CONST               5 (<code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>)
             18 LOAD_CONST               6 ('bisect_right')
             20 MAKE_FUNCTION            1 (defaults)
             22 STORE_NAME               2 (bisect_right)

 36          24 LOAD_CONST              14 ((0, None))
             26 LOAD_CONST               7 (<code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>)
             28 LOAD_CONST               8 ('insort_left')
             30 MAKE_FUNCTION            1 (defaults)
             32 STORE_NAME               3 (insort_left)

 49          34 LOAD_CONST              15 ((0, None))
             36 LOAD_CONST               9 (<code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>)
             38 LOAD_CONST              10 ('bisect_left')
             40 MAKE_FUNCTION            1 (defaults)
             42 STORE_NAME               4 (bisect_left)

 71          44 SETUP_FINALLY           12 (to 58)

 72          46 LOAD_CONST               1 (0)
             48 LOAD_CONST              11 (('*',))
             50 IMPORT_NAME              5 (_bisect)
             52 IMPORT_STAR
             54 POP_BLOCK
             56 JUMP_FORWARD            20 (to 78)

 73     >>   58 DUP_TOP
             60 LOAD_NAME                6 (ImportError)
             62 COMPARE_OP              10 (exception match)
             64 POP_JUMP_IF_FALSE       76
             66 POP_TOP
             68 POP_TOP
             70 POP_TOP

 74          72 POP_EXCEPT
             74 JUMP_FORWARD             2 (to 78)
        >>   76 END_FINALLY

 77     >>   78 LOAD_NAME                2 (bisect_right)
             80 STORE_NAME               7 (bisect)

 78          82 LOAD_NAME                1 (insort_right)
             84 STORE_NAME               8 (insort)
             86 LOAD_CONST               2 (None)
             88 RETURN_VALUE

Disassembly of <code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>:
 12           0 LOAD_GLOBAL              0 (bisect_right)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 13          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

Disassembly of <code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>:
 26           0 LOAD_FAST                2 (lo)
              2 LOAD_CONST               1 (0)
              4 COMPARE_OP               0 (<)
              6 POP_JUMP_IF_FALSE       16

 27           8 LOAD_GLOBAL              0 (ValueError)
             10 LOAD_CONST               2 ('lo must be non-negative')
             12 CALL_FUNCTION            1
             14 RAISE_VARARGS            1

 28     >>   16 LOAD_FAST                3 (hi)
             18 LOAD_CONST               3 (None)
             20 COMPARE_OP               8 (is)
             22 POP_JUMP_IF_FALSE       32

 29          24 LOAD_GLOBAL              1 (len)
             26 LOAD_FAST                0 (a)
             28 CALL_FUNCTION            1
             30 STORE_FAST               3 (hi)

 30     >>   32 LOAD_FAST                2 (lo)
             34 LOAD_FAST                3 (hi)
             36 COMPARE_OP               0 (<)
             38 POP_JUMP_IF_FALSE       80

 31          40 LOAD_FAST                2 (lo)
             42 LOAD_FAST                3 (hi)
             44 BINARY_ADD
             46 LOAD_CONST               4 (2)
             48 BINARY_FLOOR_DIVIDE
             50 STORE_FAST               4 (mid)

 32          52 LOAD_FAST                1 (x)
             54 LOAD_FAST                0 (a)
             56 LOAD_FAST                4 (mid)
             58 BINARY_SUBSCR
             60 COMPARE_OP               0 (<)
             62 POP_JUMP_IF_FALSE       70
             64 LOAD_FAST                4 (mid)
             66 STORE_FAST               3 (hi)
             68 JUMP_ABSOLUTE           32

 33     >>   70 LOAD_FAST                4 (mid)
             72 LOAD_CONST               5 (1)
             74 BINARY_ADD
             76 STORE_FAST               2 (lo)
             78 JUMP_ABSOLUTE           32

 34     >>   80 LOAD_FAST                2 (lo)
             82 RETURN_VALUE

Disassembly of <code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>:
 45           0 LOAD_GLOBAL              0 (bisect_left)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 46          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

Disassembly of <code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>:
 60           0 LOAD_FAST                2 (lo)
              2 LOAD_CONST               1 (0)
              4 COMPARE_OP               0 (<)
              6 POP_JUMP_IF_FALSE       16

 61           8 LOAD_GLOBAL              0 (ValueError)
             10 LOAD_CONST               2 ('lo must be non-negative')
             12 CALL_FUNCTION            1
             14 RAISE_VARARGS            1

 62     >>   16 LOAD_FAST                3 (hi)
             18 LOAD_CONST               3 (None)
             20 COMPARE_OP               8 (is)
             22 POP_JUMP_IF_FALSE       32

 63          24 LOAD_GLOBAL              1 (len)
             26 LOAD_FAST                0 (a)
             28 CALL_FUNCTION            1
             30 STORE_FAST               3 (hi)

 64     >>   32 LOAD_FAST                2 (lo)
             34 LOAD_FAST                3 (hi)
             36 COMPARE_OP               0 (<)
             38 POP_JUMP_IF_FALSE       80

 65          40 LOAD_FAST                2 (lo)
             42 LOAD_FAST                3 (hi)
             44 BINARY_ADD
             46 LOAD_CONST               4 (2)
             48 BINARY_FLOOR_DIVIDE
             50 STORE_FAST               4 (mid)

 66          52 LOAD_FAST                0 (a)
             54 LOAD_FAST                4 (mid)
             56 BINARY_SUBSCR
             58 LOAD_FAST                1 (x)
             60 COMPARE_OP               0 (<)
             62 POP_JUMP_IF_FALSE       74
             64 LOAD_FAST                4 (mid)
             66 LOAD_CONST               5 (1)
             68 BINARY_ADD
             70 STORE_FAST               2 (lo)
             72 JUMP_ABSOLUTE           32

 67     >>   74 LOAD_FAST                4 (mid)
             76 STORE_FAST               3 (hi)
             78 JUMP_ABSOLUTE           32

 68     >>   80 LOAD_FAST                2 (lo)
             82 RETURN_VALUE(

Note that this is separates out the top level code object , defining the module, and the code objects of functions and classes.请注意,这是将顶级代码对象、定义模块以及函数和类的代码对象分开。 In Python 3.6 and older the dis.dis() function won't recurse .在 Python 3.6 及更早版本中, dis.dis()函数不会 recurse In those versions, if you wanted to analyse the functions contained, you'll need to load the nested code objects from the top-level code.co_consts array.在这些版本中,如果您想分析包含的函数,您需要从顶级code.co_consts数组加载嵌套的code对象。 For example, the insort_right function's code object is loaded with LOAD_CONST 3 , so you look for the code object at that index:例如, insort_right函数的代码对象使用LOAD_CONST 3加载,因此您可以在该索引处查找代码对象:

>>> code.co_consts[3]
<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>
>>> dis.dis(code.co_consts[3])
 12           0 LOAD_GLOBAL              0 (bisect_right)
              2 LOAD_FAST                0 (a)
              4 LOAD_FAST                1 (x)
              6 LOAD_FAST                2 (lo)
              8 LOAD_FAST                3 (hi)
             10 CALL_FUNCTION            4
             12 STORE_FAST               2 (lo)

 13          14 LOAD_FAST                0 (a)
             16 LOAD_METHOD              1 (insert)
             18 LOAD_FAST                2 (lo)
             20 LOAD_FAST                1 (x)
             22 CALL_METHOD              2
             24 POP_TOP
             26 LOAD_CONST               1 (None)
             28 RETURN_VALUE

I personally would avoid trying to parse the .pyc file with anything other than the matching Python version and marshal module.我个人会避免尝试使用匹配的 Python 版本和marshal模块以外的任何内容解析.pyc文件。 The marshal format is basically an internal serialisation format that changes with the needs of Python itself. marshal格式基本上是一种内部序列化格式,它随着 Python 本身的需要而变化。 New features like list comprehensions and with statements and async / await require new additions to the format, which is not published other than as C source code .列表推导式和with语句和async / await等新功能需要对格式添加新内容,除了作为C 源代码外,不会发布。

If you do go this route, and manage to read a code object by other means than using the module, you'll have to parse out the disassembly from the various attributes of the code object;如果你确实走这条路,并设法通过使用模块以外的其他方式读取code对象,则必须从代码对象的各种属性中解析出反汇编; see the dis module source for details on how to do this (you'll have to use the co_firstlineno and co_lnotab attributes to create a bytecode-offset-to-linenumber map, for example).有关如何执行此操作的详细信息,请参阅dis模块源代码(例如,您必须使用co_firstlinenoco_lnotab属性来创建字节码偏移到行号的映射)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM