简体   繁体   English

为什么 `is` 运算符在脚本和 REPL 中的行为不同?

[英]Why does the `is` operator behave differently in a script vs the REPL?

In python, two codes have different results:在python中,两个代码有不同的结果:

a = 300
b = 300
print (a==b)
print (a is b)      ## print True
print ("id(a) = %d, id(b) = %d"%(id(a), id(b))) ## They have same address

But in shell mode(interactive mode):但在 shell 模式下(交互模式):

>>> a = 300
>>> b = 300
>>> a is b
False
>>> id(a)
4501364368
>>> id(b)
4501362224

"is" operator has different results. "is" 运算符有不同的结果。

When you run code in a .py script, the entire file is compiled into a code object before execution.当您在.py脚本中运行代码时,整个文件会在执行前编译为代码对象。 In this case, CPython is able to make certain optimizations - like reusing the same instance for the integer 300.在这种情况下,CPython 能够进行某些优化——比如为整数 300 重用同一个实例。

You could also reproduce that in the REPL, by executing code in a context more closely resembling the execution of a script:您还可以通过在更类似于执行脚本的上下文中执行代码来在 REPL 中重现它:

>>> source = """\ 
... a = 300 
... b = 300 
... print (a==b) 
... print (a is b)## print True 
... print ("id(a) = %d, id(b) = %d"%(id(a), id(b))) ## They have same address 
... """
>>> code_obj = compile(source, filename="myscript.py", mode="exec")
>>> exec(code_obj) 
True
True
id(a) = 140736953597776, id(b) = 140736953597776

Some of these optimizations are pretty aggressive.其中一些优化非常激进。 You could modify the script line b = 300 changing it to b = 150 + 150 , and CPython would still "fold" b into the same constant.您可以修改脚本行b = 300将其更改为b = 150 + 150 ,CPython 仍会将b “折叠”为相同的常量。 If you're interested in such implementation details, look in peephole.c and Ctrl+F for PyCode_Optimize and any info about the "consts table".如果您对此类实现细节感兴趣,请查看peephole.c和 Ctrl+F 以获取PyCode_Optimize以及有关“consts 表”的任何信息。

In contrast, when you run code line-by-line directly in the REPL it executes in a different context.相反,当您直接在 REPL 中逐行运行代码时,它会在不同的上下文中执行。 Each line is compiled in "single" mode and this optimization is not available.每行都以“单”模式编译,并且此优化不可用。

>>> scope = {} 
>>> lines = source.splitlines()
>>> for line in lines: 
...     code_obj = compile(line, filename="<I'm in the REPL>", mode="single")
...     exec(code_obj, scope) 
...
True
False
id(a) = 140737087176016, id(b) = 140737087176080
>>> scope['a'], scope['b']
(300, 300)
>>> id(scope['a']), id(scope['b'])
(140737087176016, 140737087176080)

There are actually two things to know about CPython and its behavior here.关于 CPython 及其行为,实际上有两件事需要了解。 First, small integers in the range of [-5, 256] are interned internally.首先, [-5, 256]范围内的小整数在内部被保留。 So any value falling in that range will share the same id, even at the REPL:因此,任何落在该范围内的值都将共享相同的 id,即使在 REPL 中也是如此:

>>> a = 100
>>> b = 100
>>> a is b
True

Since 300 > 256, it's not being interned:由于 300 > 256,它没有被拘留:

>>> a = 300
>>> b = 300
>>> a is b
False

Second, is that in a script, literals are put into a constant section of the compiled code.其次,在脚本中,文字被放入已编译代码的常量部分。 Python is smart enough to realize that since both a and b refer to the literal 300 and that 300 is an immutable object, it can just go ahead and reference the same constant location. Python 足够聪明地意识到,由于ab都引用文字300并且300是一个不可变对象,它可以继续引用相同的常量位置。 If you tweak your script a bit and write it as:如果您稍微调整一下脚本并将其编写为:

def foo():
    a = 300
    b = 300
    print(a==b)
    print(a is b)
    print("id(a) = %d, id(b) = %d" % (id(a), id(b)))


import dis
dis.disassemble(foo.__code__)

The beginning part of the output looks like this:输出的开始部分如下所示:

2           0 LOAD_CONST               1 (300)
            2 STORE_FAST               0 (a)

3           4 LOAD_CONST               1 (300)
            6 STORE_FAST               1 (b)

...

As you can see, CPython is loading the a and b using the same constant slot.如您所见,CPython 使用相同的常量槽加载ab This means that a and b are now referring to the same object (because they reference the same slot) and that is why a is b is True in the script but not at the REPL.这意味着ab现在引用同一个对象(因为它们引用同一个插槽),这就是为什么a is b在脚本中为True但在 REPL 中不是。

You can see this behavior in the REPL too, if you wrap your statements in a function:如果您将语句包装在一个函数中,您也可以在 REPL 中看到这种行为:

>>> import dis
>>> def foo():
...   a = 300
...   b = 300
...   print(a==b)
...   print(a is b)
...   print("id(a) = %d, id(b) = %d" % (id(a), id(b)))
...
>>> foo()
True
True
id(a) = 4369383056, id(b) = 4369383056
>>> dis.disassemble(foo.__code__)
  2           0 LOAD_CONST               1 (300)
              2 STORE_FAST               0 (a)

  3           4 LOAD_CONST               1 (300)
              6 STORE_FAST               1 (b)
# snipped...

Bottom line: while CPython makes these optimizations at times, you shouldn't really count on it--it's really an implementation detail, and one that they've changed over time (CPython used to only do this for integers up to 100, for example).底线:虽然 CPython 有时会进行这些优化,但您不应该真正指望它——它实际上是一个实现细节,并且随着时间的推移它们已经改变(CPython 过去只对不超过 100 的整数执行此操作,因为例子)。 If you're comparing numbers, use == .如果您要比较数字,请使用== :-) :-)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM