將字節轉換為字符串

Question

我將外部程序的標准 output 捕獲到一個bytes object 中：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>>
>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

我想把它轉換成一個普通的 Python 字符串，這樣我就可以像這樣打印它：

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

我嘗試了binascii.b2a_qp()方法，但又得到了相同的bytes object：

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

如何使用 Python 3 將bytes object 轉換為str ？

Answer 1

解碼bytes對象以產生一個字符串：

>>> b"abcde".decode("utf-8") 
'abcde'

上面的例子假設bytes對象是 UTF-8，因為它是一種常見的編碼。 但是，您應該使用數據實際所在的編碼！

Answer 2

解碼字節字符串並將其轉換為字符 (Unicode) 字符串。

蟒蛇 3：

encoding = 'utf-8'
b'hello'.decode(encoding)

或者

str(b'hello', encoding)

蟒蛇2：

encoding = 'utf-8'
'hello'.decode(encoding)

或者

unicode('hello', encoding)

Answer 3

這將一個字節列表連接成一個字符串：

>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'

Answer 4

如果您不知道編碼，那么要以 Python 3 和 Python 2 兼容的方式將二進制輸入讀入字符串，請使用古老的 MS-DOS CP437編碼：

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('cp437'))

因為編碼是未知的，所以期望非英文符號翻譯成cp437字符（英文字符不翻譯，因為它們在大多數單字節編碼和 UTF-8 中匹配）。

將任意二進制輸入解碼為 UTF-8 是不安全的，因為您可能會得到以下信息：

>>> b'\x00\x01\xffsd'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 2: invalid
start byte

這同樣適用於latin-1 ，它在 Python 2 中很流行（默認設置？）。請參閱代碼頁布局中的缺失點 - 這是 Python 因臭名昭著ordinal not in range窒息的地方。

更新 20150604 ：有傳言稱 Python 3 具有surrogateescape錯誤策略，可以將內容編碼為二進制數據而不會丟失數據和崩潰，但它需要轉換測試[binary] -> [str] -> [binary]來驗證這兩種性能和可靠性。

更新 20170116 ：感謝 Nearoo 的評論 - 也有可能使用backslashreplace替換錯誤處理程序對所有未知字節進行斜線轉義。 這僅適用於 Python 3，因此即使使用此解決方法，您仍然會從不同的 Python 版本中獲得不一致的輸出：

PY3K = sys.version_info >= (3, 0)

lines = []
for line in stream:
    if not PY3K:
        lines.append(line)
    else:
        lines.append(line.decode('utf-8', 'backslashreplace'))

有關詳細信息，請參閱Python 的 Unicode 支持。

更新 20170119 ：我決定實現適用於 Python 2 和 Python 3 的斜線轉義解碼。它應該比cp437解決方案慢，但它應該在每個 Python 版本上產生相同的結果。

# --- preparation

import codecs

def slashescape(err):
    """ codecs error handler. err is UnicodeDecode instance. return
    a tuple with a replacement for the unencodable part of the input
    and a position where encoding should continue"""
    #print err, dir(err), err.start, err.end, err.object[:err.start]
    thebyte = err.object[err.start:err.end]
    repl = u'\\x'+hex(ord(thebyte))[2:]
    return (repl, err.end)

codecs.register_error('slashescape', slashescape)

# --- processing

stream = [b'\x80abc']

lines = []
for line in stream:
    lines.append(line.decode('utf-8', 'slashescape'))

Answer 5

在 Python 3中，默認編碼是"utf-8" ，所以可以直接使用：

b'hello'.decode()

這相當於

b'hello'.decode(encoding="utf-8")

另一方面，在 Python 2中，編碼默認為默認字符串編碼。 因此，您應該使用：

b'hello'.decode(encoding)

其中encoding是您想要的編碼。

注意：在 Python 2.7 中添加了對關鍵字參數的支持。

Answer 6

我認為你實際上想要這個：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')

Aaron 的回答是正確的，只是您需要知道要使用哪種編碼。 而且我相信 Windows 使用“windows-1252”。 僅當您的內容中有一些不尋常的（非 ASCII）字符時才重要，但它會有所作為。

順便說一句，它確實很重要的事實是 Python 轉向對二進制和文本數據使用兩種不同類型的原因：它不能在它們之間進行神奇的轉換，因為除非你告訴它，否則它不知道編碼！ 您知道的唯一方法是閱讀 Windows 文檔（或在此處閱讀）。

Answer 7

由於這個問題實際上是在詢問subprocess輸出，因此您可以使用更直接的方法。 最現代的方法是使用subprocess.check_output並傳遞text=True (Python 3.7+) 以使用系統默認編碼自動解碼標准輸出：

text = subprocess.check_output(["ls", "-l"], text=True)

對於 Python 3.6， Popen接受編碼關鍵字：

>>> from subprocess import Popen, PIPE
>>> text = Popen(['ls', '-l'], stdout=PIPE, encoding='utf-8').communicate()[0]
>>> type(text)
str
>>> print(text)
total 0
-rw-r--r-- 1 wim badger 0 May 31 12:45 some_file.txt

如果您不處理子進程輸出，則標題中問題的一般答案是將字節解碼為文本：

>>> b'abcde'.decode()
'abcde'

如果沒有參數，將使用sys.getdefaultencoding() 。 如果您的數據不是sys.getdefaultencoding() ，那么您必須在decode調用中明確指定編碼：

>>> b'caf\xe9'.decode('cp1250')
'café'

Answer 8

將universal_newlines設置為True，即

command_stdout = Popen(['ls', '-l'], stdout=PIPE, universal_newlines=True).communicate()[0]

Answer 9

要將字節序列解釋為文本，您必須知道相應的字符編碼：

unicode_text = bytestring.decode(character_encoding)

例子：

>>> b'\xc2\xb5'.decode('utf-8')
'µ'

ls命令可能會產生無法解釋為文本的輸出。 Unix 上的文件名可以是任何字節序列，除了斜杠b'/'和零b'\0' ：

>>> open(bytes(range(0x100)).translate(None, b'\0/'), 'w').close()

嘗試使用 utf-8 編碼解碼這樣的字節湯會引發UnicodeDecodeError 。

情況可能更糟。 如果您使用錯誤的不兼容編碼，解碼可能會靜默失敗並產生mojibake ：

>>> '—'.encode('utf-8').decode('cp1252')
'â€”'

數據已損壞，但您的程序仍然不知道發生了故障。

一般來說，使用什么字符編碼並不嵌入字節序列本身。 您必須在帶外傳達此信息。 有些結果比其他結果更有可能，因此存在可以猜測字符編碼的chardet模塊。 一個 Python 腳本可能在不同的地方使用多個字符編碼。

ls輸出可以使用os.fsdecode()函數轉換為 Python 字符串，即使對於不可解碼的文件名也能成功（它在 Unix 上使用sys.getfilesystemencoding()和surrogateescape錯誤處理程序）：

import os
import subprocess

output = os.fsdecode(subprocess.check_output('ls'))

要獲取原始字節，您可以使用os.fsencode() 。

如果您傳遞universal_newlines=True參數，則subprocess進程使用locale.getpreferredencoding(False)來解碼字節，例如，它可以是Windows 上的cp1252 。

要即時解碼字節流，可以使用io.TextIOWrapper() ： example 。

不同的命令可能對它們的輸出使用不同的字符編碼，例如， dir內部命令 ( cmd ) 可能使用 cp437。 要解碼其輸出，您可以顯式傳遞編碼（Python 3.6+）：

output = subprocess.check_output('dir', shell=True, encoding='cp437')

文件名可能與os.listdir() （使用 Windows Unicode API）不同，例如， '\xb6'可以替換為'\x14' Python 的 cp437 編解碼器映射b'\x14'來控制字符 U+0014 而不是 U +00B6 (¶)。 要支持具有任意 Unicode 字符的文件名，請參閱將可能包含非 ASCII Unicode 字符的 PowerShell 輸出解碼為 Python 字符串

Answer 10

雖然@Aaron Maenpaa 的回答很有效，但一位用戶最近問：

還有更簡單的方法嗎？ 'fhand.read().decode("ASCII")' [...] 太長了！

您可以使用：

command_stdout.decode()

decode()有一個標准參數：

codecs.decode(obj, encoding='utf-8', errors='strict')

Answer 11

如果您應該通過嘗試decode()獲得以下信息：

AttributeError：“str”對象沒有屬性“decode”

您還可以直接在強制轉換中指定編碼類型：

>>> my_byte_str
b'Hello World'

>>> str(my_byte_str, 'utf-8')
'Hello World'

Answer 12

如果您遇到此錯誤：

utf-8 codec can't decode byte 0x8a ，

那么最好使用以下代碼將字節轉換為字符串：

bytes = b"abcdefg"
string = bytes.decode("utf-8", "ignore")

Answer 13

我做了一個清理列表的功能

def cleanLists(self, lista):
    lista = [x.strip() for x in lista]
    lista = [x.replace('\n', '') for x in lista]
    lista = [x.replace('\b', '') for x in lista]
    lista = [x.encode('utf8') for x in lista]
    lista = [x.decode('utf8') for x in lista]

    return lista

Answer 14

對於 Python 3，這是一種從byte轉換為string的更安全和Pythonic的方法：

def byte_to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes): # Check if it's in bytes
        print(bytes_or_str.decode('utf-8'))
    else:
        print("Object not of byte type")

byte_to_str(b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n')

輸出：

total 0
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

Answer 15

在處理來自 Windows 系統的數據（帶有\r\n行尾）時，我的答案是

String = Bytes.decode("utf-8").replace("\r\n", "\n")

為什么？ 用多行 Input.txt 試試這個：

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)

您所有的行尾都將加倍（到\r\r\n ），導致額外的空行。 Python 的文本讀取函數通常對行尾進行規范化，以便字符串僅使用\n 。 如果您從 Windows 系統接收二進制數據，Python 沒有機會這樣做。 因此，

Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)

將復制您的原始文件。

Answer 16

從sys — 系統特定的參數和功能：

要從/向標准流寫入或讀取二進制數據，請使用底層二進制緩沖區。 例如，要將字節寫入標准輸出，請使用sys.stdout.buffer.write(b'abc') 。

Answer 17

對於“運行 shell 命令並將其輸出作為文本而不是字節”的特定情況，在 Python 3.7 上，您應該使用subprocess.run並傳入text=True （以及capture_output=True來捕獲輸出）

command_result = subprocess.run(["ls", "-l"], capture_output=True, text=True)
command_result.stdout  # is a `str` containing your program's stdout

text曾經被稱為universal_newlines ，並在 Python 3.7 中被更改（好吧，別名）。 如果要支持 Python 3.7 之前的版本，請傳入universal_newlines=True而不是text=True

Answer 18

使用.decode()解碼。 這將解碼字符串。 傳入'utf-8' ) 作為內部的值。

Answer 19

def toString(string):    
    try:
        return v.decode("utf-8")
    except ValueError:
        return string

b = b'97.080.500'
s = '97.080.500'
print(toString(b))
print(toString(s))

Answer 20

如果要轉換任何字節，而不僅僅是轉換為字節的字符串：

with open("bytesfile", "rb") as infile:
    str = base64.b85encode(imageFile.read())

with open("bytesfile", "rb") as infile:
    str2 = json.dumps(list(infile.read()))

然而，這不是很有效。 它將一張 2 MB 的圖片變成 9 MB。

Answer 21

嘗試這個

bytes.fromhex('c3a9').decode('utf-8')

Answer 22

我們可以使用bytes.decode(encoding='utf-8', errors='strict')對 bytes 對象進行解碼以生成字符串作為文檔。 點擊這里

Python3示例：

byte_value = b"abcde"
print("Initial value = {}".format(byte_value))
print("Initial value type = {}".format(type(byte_value)))
string_value = byte_value.decode("utf-8")
# utf-8 is used here because it is a very common encoding, but you need to use the encoding your data is actually in.
print("------------")
print("Converted value = {}".format(string_value))
print("Converted value type = {}".format(type(string_value)))

輸出：

Initial value = b'abcde'
Initial value type = <class 'bytes'>
------------
Converted value = abcde
Converted value type = <class 'str'>

注意：在 Python3 中，默認編碼類型是utf-8 。 所以， <byte_string>.decode("utf-8")也可以寫成<byte_string>.decode()

Answer 23

嘗試使用這個； 此函數將忽略所有非字符集（如utf-8 ）二進制文件並返回一個干凈的字符串。 它已針對python3.6及更高版本進行了測試。

def bin2str(text, encoding = 'utf-8'):
    """Converts a binary to Unicode string by removing all non Unicode char
    text: binary string to work on
    encoding: output encoding *utf-8"""

    return text.decode(encoding, 'ignore')

在這里，該函數將獲取二進制文件並對其進行解碼（使用 python 預定義的字符集將二進制數據轉換為字符，並且ignore參數忽略二進制文件中的所有非字符集數據，最后返回您想要的string值。

如果您不確定編碼，請使用sys.getdefaultencoding()獲取設備的默認編碼。

Answer 24

字節

m=b'This is bytes'

轉換為字符串

方法一

m.decode("utf-8")

或者

m.decode()

方法二

import codecs
codecs.decode(m,encoding="utf-8")

或者

import codecs
codecs.decode(m)

方法三

str(m,encoding="utf-8")

或者

str(m)

結果

'This is bytes'

Answer 25

我正在使用以下代碼從外部程序獲取標准輸出：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communication（）方法返回一個字節數組：

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

但是，我想將輸出作為普通的Python字符串使用。 這樣我就可以像這樣打印它：

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

我認為這就是binascii.b2a_qp（）方法的用途，但是當我嘗試使用它時，我又得到了相同的字節數組：

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

如何將字節值轉換回字符串？ 我的意思是，使用“電池”而不是手動進行操作。 我希望它與Python 3兼容。

Answer 26

我正在使用以下代碼從外部程序獲取標准輸出：

>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]

communication（）方法返回一個字節數組：

>>> command_stdout
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

但是，我想將輸出作為普通的Python字符串使用。 這樣我就可以像這樣打印它：

>>> print(command_stdout)
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1
-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2

我認為這就是binascii.b2a_qp（）方法的用途，但是當我嘗試使用它時，我又得到了相同的字節數組：

>>> binascii.b2a_qp(command_stdout)
b'total 0\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file1\n-rw-rw-r-- 1 thomas thomas 0 Mar  3 07:03 file2\n'

如何將字節值轉換回字符串？ 我的意思是，使用“電池”而不是手動進行操作。 我希望它與Python 3兼容。

Answer 27

你想將字節解碼為字符串是

>>> b"abcde"
b'abcde'

# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8") 
'abcde'

Answer 28

my_string = "hello" my_string = my_string.encode() my_string = my_string.decode()

將字節轉換為字符串

問題描述

24 個解決方案

解決方案1
5205 已采納 2009-03-03 12:26:18

解決方案2
364 2009-03-03 12:28:31

解決方案3
247 2012-08-22 12:57:08

解決方案4
123 2014-12-17 14:23:09

解決方案5
117 2016-06-29 14:21:21

解決方案6
49 2011-07-18 19:51:15

解決方案7
41 2018-05-31 17:52:19

解決方案8
38 2014-01-21 15:31:09

解決方案9
34 2016-11-16 09:43:26

解決方案10
28 2015-11-13 10:24:21

解決方案11
19 2017-11-22 04:20:55

解決方案12
18 2021-10-21 06:36:44

解決方案13
9 2016-06-01 00:03:04

解決方案14
9 2017-01-18 07:21:09

解決方案15
9 2018-03-16 13:28:25

解決方案16
5 2014-01-11 07:15:18

解決方案17
5 2019-08-07 14:15:31

解決方案18
4 2021-07-09 02:09:41

解決方案19
3 2018-06-03 22:44:45

解決方案20
3 2019-06-01 02:30:56

解決方案21
3 2020-01-19 08:19:02

解決方案22
3 2022-02-23 12:52:03

解決方案23
2 2021-05-18 19:07:58

解決方案24
2 2022-06-21 13:18:28

字節

轉換為字符串

方法一

方法二

方法三

結果

解決方案25
1 2021-01-29 09:00:00

解決方案26
0 2021-01-22 14:18:18

解決方案27
0 2021-12-28 09:03:16

解決方案28
0 2022-08-23 08:50:02

將字節轉換為字符串

問題描述

24 個解決方案

解決方案1 5205 已采納 2009-03-03 12:26:18

解決方案2 364 2009-03-03 12:28:31

解決方案3 247 2012-08-22 12:57:08

解決方案4 123 2014-12-17 14:23:09

解決方案5 117 2016-06-29 14:21:21

解決方案6 49 2011-07-18 19:51:15

解決方案7 41 2018-05-31 17:52:19

解決方案8 38 2014-01-21 15:31:09

解決方案9 34 2016-11-16 09:43:26

解決方案10 28 2015-11-13 10:24:21

解決方案11 19 2017-11-22 04:20:55

解決方案12 18 2021-10-21 06:36:44

解決方案13 9 2016-06-01 00:03:04

解決方案14 9 2017-01-18 07:21:09

解決方案15 9 2018-03-16 13:28:25

解決方案16 5 2014-01-11 07:15:18

解決方案17 5 2019-08-07 14:15:31

解決方案18 4 2021-07-09 02:09:41

解決方案19 3 2018-06-03 22:44:45

解決方案20 3 2019-06-01 02:30:56

解決方案21 3 2020-01-19 08:19:02

解決方案22 3 2022-02-23 12:52:03

解決方案23 2 2021-05-18 19:07:58

解決方案24 2 2022-06-21 13:18:28

字節

轉換為字符串

方法一

方法二

方法三

結果

解決方案25 1 2021-01-29 09:00:00

解決方案26 0 2021-01-22 14:18:18

解決方案27 0 2021-12-28 09:03:16

解決方案28 0 2022-08-23 08:50:02

解決方案1
5205 已采納 2009-03-03 12:26:18

解決方案2
364 2009-03-03 12:28:31

解決方案3
247 2012-08-22 12:57:08

解決方案4
123 2014-12-17 14:23:09

解決方案5
117 2016-06-29 14:21:21

解決方案6
49 2011-07-18 19:51:15

解決方案7
41 2018-05-31 17:52:19

解決方案8
38 2014-01-21 15:31:09

解決方案9
34 2016-11-16 09:43:26

解決方案10
28 2015-11-13 10:24:21

解決方案11
19 2017-11-22 04:20:55

解決方案12
18 2021-10-21 06:36:44

解決方案13
9 2016-06-01 00:03:04

解決方案14
9 2017-01-18 07:21:09

解決方案15
9 2018-03-16 13:28:25

解決方案16
5 2014-01-11 07:15:18

解決方案17
5 2019-08-07 14:15:31

解決方案18
4 2021-07-09 02:09:41

解決方案19
3 2018-06-03 22:44:45

解決方案20
3 2019-06-01 02:30:56

解決方案21
3 2020-01-19 08:19:02

解決方案22
3 2022-02-23 12:52:03

解決方案23
2 2021-05-18 19:07:58

解決方案24
2 2022-06-21 13:18:28

解決方案25
1 2021-01-29 09:00:00

解決方案26
0 2021-01-22 14:18:18

解決方案27
0 2021-12-28 09:03:16

解決方案28
0 2022-08-23 08:50:02