[英]How can I understand a .pyc file content
I have a .pyc
file.我有一个
.pyc
文件。 I need to understand the content of that file to know how the disassembler works of python, ie how can I generate a output like dis.dis(function)
from .pyc
file content.我需要了解该文件的内容以了解反汇编程序如何在 python 中工作,即如何从
.pyc
文件内容生成类似dis.dis(function)
的输出。
for eg例如
>>> def sqr(x):
... return x*x
...
>>> import dis
>>> dis.dis(sqr)
2 0 LOAD_FAST 0 (x)
3 LOAD_FAST 0 (x)
6 BINARY_MULTIPLY
7 RETURN_VALUE
I need to get a output like this using the .pyc
file.我需要使用
.pyc
文件获得这样的输出。
.pyc
files contain some metadata and a marshal
ed code
object; .pyc
文件包含一些元数据和一个marshal
code
对象; to load the code
object and disassemble that use:加载
code
对象并反汇编使用:
import dis, marshal, sys
header_sizes = [
# (size, first version this applies to)
# pyc files were introduced in 0.9.2 way, way back in June 1991.
(8, (0, 9, 2)), # 2 bytes magic number, \r\n, 4 bytes UNIX timestamp
(12, (3, 6)), # added 4 bytes file size
# bytes 4-8 are flags, meaning of 9-16 depends on what flags are set
# bit 0 not set: 9-12 timestamp, 13-16 file size
# bit 0 set: 9-16 file hash (SipHash-2-4, k0 = 4 bytes of the file, k1 = 0)
(16, (3, 7)), # inserted 4 bytes bit flag field at 4-8
# future version may add more bytes still, at which point we can extend
# this table. It is correct for Python versions up to 3.9
]
header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)
with open(pycfile, "rb") as f:
metadata = f.read(header_size) # first header_size bytes are metadata
code = marshal.load(f) # rest is a marshalled code object
dis.dis(code)
Demo with the bisect
module:使用
bisect
模块进行演示:
>>> import bisect
>>> import dis, marshal
>>> import sys
>>> header_sizes = [(8, (0, 9, 2)), (12, (3, 6)), (16, (3, 7))]
>>> header_size = next(s for s, v in reversed(header_sizes) if sys.version_info >= v)
>>> pycfile = getattr(bisect, '__cached__', pycfile.__file__)
>>> with open(pycfile, "rb") as f:
... metadata = f.read(header_size) # first header_size bytes are metadata
... code = marshal.load(f) # rest is bytecode
...
>>> dis.dis(code)
1 0 LOAD_CONST 0 ('Bisection algorithms.')
2 STORE_NAME 0 (__doc__)
3 4 LOAD_CONST 12 ((0, None))
6 LOAD_CONST 3 (<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>)
8 LOAD_CONST 4 ('insort_right')
10 MAKE_FUNCTION 1 (defaults)
12 STORE_NAME 1 (insort_right)
15 14 LOAD_CONST 13 ((0, None))
16 LOAD_CONST 5 (<code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>)
18 LOAD_CONST 6 ('bisect_right')
20 MAKE_FUNCTION 1 (defaults)
22 STORE_NAME 2 (bisect_right)
36 24 LOAD_CONST 14 ((0, None))
26 LOAD_CONST 7 (<code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>)
28 LOAD_CONST 8 ('insort_left')
30 MAKE_FUNCTION 1 (defaults)
32 STORE_NAME 3 (insort_left)
49 34 LOAD_CONST 15 ((0, None))
36 LOAD_CONST 9 (<code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>)
38 LOAD_CONST 10 ('bisect_left')
40 MAKE_FUNCTION 1 (defaults)
42 STORE_NAME 4 (bisect_left)
71 44 SETUP_FINALLY 12 (to 58)
72 46 LOAD_CONST 1 (0)
48 LOAD_CONST 11 (('*',))
50 IMPORT_NAME 5 (_bisect)
52 IMPORT_STAR
54 POP_BLOCK
56 JUMP_FORWARD 20 (to 78)
73 >> 58 DUP_TOP
60 LOAD_NAME 6 (ImportError)
62 COMPARE_OP 10 (exception match)
64 POP_JUMP_IF_FALSE 76
66 POP_TOP
68 POP_TOP
70 POP_TOP
74 72 POP_EXCEPT
74 JUMP_FORWARD 2 (to 78)
>> 76 END_FINALLY
77 >> 78 LOAD_NAME 2 (bisect_right)
80 STORE_NAME 7 (bisect)
78 82 LOAD_NAME 1 (insort_right)
84 STORE_NAME 8 (insort)
86 LOAD_CONST 2 (None)
88 RETURN_VALUE
Disassembly of <code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>:
12 0 LOAD_GLOBAL 0 (bisect_right)
2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (x)
6 LOAD_FAST 2 (lo)
8 LOAD_FAST 3 (hi)
10 CALL_FUNCTION 4
12 STORE_FAST 2 (lo)
13 14 LOAD_FAST 0 (a)
16 LOAD_METHOD 1 (insert)
18 LOAD_FAST 2 (lo)
20 LOAD_FAST 1 (x)
22 CALL_METHOD 2
24 POP_TOP
26 LOAD_CONST 1 (None)
28 RETURN_VALUE
Disassembly of <code object bisect_right at 0x10694f2f0, file "/.../lib/python3.8/bisect.py", line 15>:
26 0 LOAD_FAST 2 (lo)
2 LOAD_CONST 1 (0)
4 COMPARE_OP 0 (<)
6 POP_JUMP_IF_FALSE 16
27 8 LOAD_GLOBAL 0 (ValueError)
10 LOAD_CONST 2 ('lo must be non-negative')
12 CALL_FUNCTION 1
14 RAISE_VARARGS 1
28 >> 16 LOAD_FAST 3 (hi)
18 LOAD_CONST 3 (None)
20 COMPARE_OP 8 (is)
22 POP_JUMP_IF_FALSE 32
29 24 LOAD_GLOBAL 1 (len)
26 LOAD_FAST 0 (a)
28 CALL_FUNCTION 1
30 STORE_FAST 3 (hi)
30 >> 32 LOAD_FAST 2 (lo)
34 LOAD_FAST 3 (hi)
36 COMPARE_OP 0 (<)
38 POP_JUMP_IF_FALSE 80
31 40 LOAD_FAST 2 (lo)
42 LOAD_FAST 3 (hi)
44 BINARY_ADD
46 LOAD_CONST 4 (2)
48 BINARY_FLOOR_DIVIDE
50 STORE_FAST 4 (mid)
32 52 LOAD_FAST 1 (x)
54 LOAD_FAST 0 (a)
56 LOAD_FAST 4 (mid)
58 BINARY_SUBSCR
60 COMPARE_OP 0 (<)
62 POP_JUMP_IF_FALSE 70
64 LOAD_FAST 4 (mid)
66 STORE_FAST 3 (hi)
68 JUMP_ABSOLUTE 32
33 >> 70 LOAD_FAST 4 (mid)
72 LOAD_CONST 5 (1)
74 BINARY_ADD
76 STORE_FAST 2 (lo)
78 JUMP_ABSOLUTE 32
34 >> 80 LOAD_FAST 2 (lo)
82 RETURN_VALUE
Disassembly of <code object insort_left at 0x10694f240, file "/.../lib/python3.8/bisect.py", line 36>:
45 0 LOAD_GLOBAL 0 (bisect_left)
2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (x)
6 LOAD_FAST 2 (lo)
8 LOAD_FAST 3 (hi)
10 CALL_FUNCTION 4
12 STORE_FAST 2 (lo)
46 14 LOAD_FAST 0 (a)
16 LOAD_METHOD 1 (insert)
18 LOAD_FAST 2 (lo)
20 LOAD_FAST 1 (x)
22 CALL_METHOD 2
24 POP_TOP
26 LOAD_CONST 1 (None)
28 RETURN_VALUE
Disassembly of <code object bisect_left at 0x10694f190, file "/.../lib/python3.8/bisect.py", line 49>:
60 0 LOAD_FAST 2 (lo)
2 LOAD_CONST 1 (0)
4 COMPARE_OP 0 (<)
6 POP_JUMP_IF_FALSE 16
61 8 LOAD_GLOBAL 0 (ValueError)
10 LOAD_CONST 2 ('lo must be non-negative')
12 CALL_FUNCTION 1
14 RAISE_VARARGS 1
62 >> 16 LOAD_FAST 3 (hi)
18 LOAD_CONST 3 (None)
20 COMPARE_OP 8 (is)
22 POP_JUMP_IF_FALSE 32
63 24 LOAD_GLOBAL 1 (len)
26 LOAD_FAST 0 (a)
28 CALL_FUNCTION 1
30 STORE_FAST 3 (hi)
64 >> 32 LOAD_FAST 2 (lo)
34 LOAD_FAST 3 (hi)
36 COMPARE_OP 0 (<)
38 POP_JUMP_IF_FALSE 80
65 40 LOAD_FAST 2 (lo)
42 LOAD_FAST 3 (hi)
44 BINARY_ADD
46 LOAD_CONST 4 (2)
48 BINARY_FLOOR_DIVIDE
50 STORE_FAST 4 (mid)
66 52 LOAD_FAST 0 (a)
54 LOAD_FAST 4 (mid)
56 BINARY_SUBSCR
58 LOAD_FAST 1 (x)
60 COMPARE_OP 0 (<)
62 POP_JUMP_IF_FALSE 74
64 LOAD_FAST 4 (mid)
66 LOAD_CONST 5 (1)
68 BINARY_ADD
70 STORE_FAST 2 (lo)
72 JUMP_ABSOLUTE 32
67 >> 74 LOAD_FAST 4 (mid)
76 STORE_FAST 3 (hi)
78 JUMP_ABSOLUTE 32
68 >> 80 LOAD_FAST 2 (lo)
82 RETURN_VALUE(
Note that this is separates out the top level code object , defining the module, and the code objects of functions and classes.请注意,这是将顶级代码对象、定义模块以及函数和类的代码对象分开。 In Python 3.6 and older the
dis.dis()
function won't recurse .在 Python 3.6 及更早版本中,
dis.dis()
函数不会 recurse 。 In those versions, if you wanted to analyse the functions contained, you'll need to load the nested code
objects from the top-level code.co_consts
array.在这些版本中,如果您想分析包含的函数,您需要从顶级
code.co_consts
数组加载嵌套的code
对象。 For example, the insort_right
function's code object is loaded with LOAD_CONST 3
, so you look for the code object at that index:例如,
insort_right
函数的代码对象使用LOAD_CONST 3
加载,因此您可以在该索引处查找代码对象:
>>> code.co_consts[3]
<code object insort_right at 0x10694f3a0, file "/.../lib/python3.8/bisect.py", line 3>
>>> dis.dis(code.co_consts[3])
12 0 LOAD_GLOBAL 0 (bisect_right)
2 LOAD_FAST 0 (a)
4 LOAD_FAST 1 (x)
6 LOAD_FAST 2 (lo)
8 LOAD_FAST 3 (hi)
10 CALL_FUNCTION 4
12 STORE_FAST 2 (lo)
13 14 LOAD_FAST 0 (a)
16 LOAD_METHOD 1 (insert)
18 LOAD_FAST 2 (lo)
20 LOAD_FAST 1 (x)
22 CALL_METHOD 2
24 POP_TOP
26 LOAD_CONST 1 (None)
28 RETURN_VALUE
I personally would avoid trying to parse the .pyc
file with anything other than the matching Python version and marshal
module.我个人会避免尝试使用匹配的 Python 版本和
marshal
模块以外的任何内容解析.pyc
文件。 The marshal
format is basically an internal serialisation format that changes with the needs of Python itself. marshal
格式基本上是一种内部序列化格式,它随着 Python 本身的需要而变化。 New features like list comprehensions and with
statements and async
/ await
require new additions to the format, which is not published other than as C source code .列表推导式和
with
语句和async
/ await
等新功能需要对格式添加新内容,除了作为C 源代码外,不会发布。
If you do go this route, and manage to read a code
object by other means than using the module, you'll have to parse out the disassembly from the various attributes of the code object;如果你确实走这条路,并设法通过使用模块以外的其他方式读取
code
对象,则必须从代码对象的各种属性中解析出反汇编; see the dis
module source for details on how to do this (you'll have to use the co_firstlineno
and co_lnotab
attributes to create a bytecode-offset-to-linenumber map, for example).有关如何执行此操作的详细信息,请参阅
dis
模块源代码(例如,您必须使用co_firstlineno
和co_lnotab
属性来创建字节码偏移到行号的映射)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.