简体   繁体   中英

Why do string comparison and identity behave differently in pdb and python console

I run same snippet of python code in python console and pdb, but I get different results as below:

pdb:

>>> import pdb
>>> pdb.set_trace()
(Pdb) print u'你好' == u'\u4f60\u597d'
False
(Pdb) print u'你好' is u'\u4f60\u597d'
False
(Pdb) print id(u'你好'), id(u'\u4f60\u597d')
4431713024 4431713120
(Pdb) id(u'你好')
4431713024
(Pdb) id(u'\u4f60\u597d')
4431713024

python console:

>>> print u'你好' == u'\u4f60\u597d'
True
>>> print u'你好' is u'\u4f60\u597d'
True
>>> print id(u'你好'), id(u'\u4f60\u597d')
4376711984 4376711984
>>> id(u'你好')
4376711984
>>> id(u'\u4f60\u597d')
4376711984

My python version is 2.7.13

So my questions:

1.why operators(like '==' and 'is') perform differently in two consoles.

2.In pdb, id(u'\你\好') equals 4431713120 in

print id(u'你好'), id(u'\u4f60\u597d')

but 4431713024 in

id(u'\u4f60\u597d')

3.Why this situation does not occur in python3

Let's start with the is checks, because that is slightly easier to answer.

Note that when you check the id s in two separate lines both the interpreter and the debugger show the same id for both strings. This is because the first string is initialized at some address, you print its id . Then you create a new string and you use the same variable name , so there are no more references pointing to the first string. This means the first string is garbage collected and its memory is freed. The newly created string takes the first free memory space, which just happens to be the one that just became free. It therefore has the same id as the first string had (when it was alive).

When checking the id s in the same line, this happens differently, because both strings exist at the same time. Here the interpreter and the debugger differ in their behavior. The interpreter interns the string, so they are the same object and have therefore the same id , while the debugger does not. (Refer to Python string interning , as recommended by @DeepSpace in the comments , for more information on interning).

I think the root cause why not can actually be seen in the first test, u'你好' == u'\你\好' . These two strings are represented differently in the interpreter and the debugger and therefore it cannot intern them (since the debugger thinks they are two different strings).

The debugger assigns different code points for the two string:

(Pdb) map(ord, u'你好')
[228, 189, 160, 229, 165, 189]
(Pdb) map(ord, u'\u4f60\u597d')
[20320, 22909]

While the interpreter does not:

>>> map(ord, u'你好')
[20320, 22909]
>>> map(ord, u'\u4f60\u597d')
[20320, 22909]

As to why not, that question needs to be answered by someone else.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM