简体   繁体   中英

Python str() function returns byte string

As far as I can tell, Python's str() function should by default return an UTF8 encoded string. However, unless I specifically specify encoding as UTF8, I get a byte string. Should I set a global somewhere to make the default active, or what am I doing wrong? Python 3.10.6 on Fedora 36/XFCE

#!/usr/bin/python3

# Get the mount point of /dev/sd* mounts.
import subprocess

str2=subprocess.check_output(['cat', '/proc/mounts'])
mounts=str2.splitlines()

#print (mounts)

for x in range(len(mounts)):
    test = str(mounts[x], encoding="UTF8")
    if test[0:7] == '/dev/sd':
        print (test)

The above gives 'test' starting with /dev/sd/, bit if I omit the encoding, the string starts with b'/dev/.

The Python 3 str always returns a Unicode string. It is NOT an an encoded byte string.

The output of subprocess is a byte string. If you do str(b"1234") the result is a Unicode string that happens to contain the b prefix: "b'1234'" That's NOT a 4-byte byte string. That's a 7-byte Unicode string.

When you do str(b"1234", encoding="UTF8") that converts the byte string to Unicode, exactly like saying b"1234".decode('UTF8') , which is the usual way to write what you have written.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM