简体   繁体   中英

Why does indexing a binary string return an integer in python3?

If given a binary string in python like

bstring = b'hello'

why does bstring[0] return the ascii code for the char 'h' (104) and not the binary char b'h' or b'\x68' ?

It's probably also good to note that b'h' == 104 returns False (this cost me about 2 hours of debugging, so I'm a little annoyed)

Because bytes are not characters.

It returns the value of the byte (as integer) that is sliced.

If you take 'hello', this is quite simple: 5 ASCII characters -> 5 bytes:

b'hello' == 'hello'.encode('utf-8')
# True

len('hello'.encode('utf-8'))
# 5

If you were to use non-ASCII characters, those could be encoded on several bytes and slicing could give you only part of a character:

len('å'.encode('utf-8'))
# 2

'å'.encode('utf-8')[0]
# 195

'å'.encode('utf-8')[1]
# 165

Think of bytes less as a “string” and more of an immutable list (or tuple ) with the constraints that all elements be integers in range(256) .

So, think of:

>>> bstring = b'hello'
>>> bstring[0]
104

as being equivalent to

>>> btuple = (104, 101, 108, 108, 111)
>>> btuple[0]
104

except with a different sequence type.

It's actually str that behaves weirdly in Python. If you index a str , you don't get a char object like you would in some other languages; you get another str .

>>> string = 'hello'
>>> string[0]
'h'
>>> type(string[0])
<class 'str'>
>>> string[0][0]
'h'
>>> string[0][0][0]
'h'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM