在 Python 3 中迭代单个字节

Question

When iterating over a bytes object in Python 3, one gets the individual bytes as ints :在 Python 3 中迭代bytes对象时，将单个bytes作为ints ：

>>> [b for b in b'123']
[49, 50, 51]

How to get 1-length bytes objects instead?如何获取 1 长度的bytes对象？

The following is possible, but not very obvious for the reader and most likely performs bad:以下是可能的，但对读者来说不是很明显，并且很可能表现不佳：

>>> [bytes([b]) for b in b'123']
[b'1', b'2', b'3']

Answer 1

If you are concerned about performance of this code and an int as a byte is not suitable interface in your case then you should probably reconsider data structures that you use eg, use str objects instead.如果您担心此代码的性能，并且int作为字节在您的情况下不适合接口，那么您可能应该重新考虑您使用的数据结构，例如，改用str对象。

You could slice the bytes object to get 1-length bytes objects:您可以对bytes对象进行切片以获取 1 长度的bytes对象：

L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]

There is PEP 0467 -- Minor API improvements for binary sequences that proposes bytes.iterbytes() method:有PEP bytes.iterbytes()二进制序列的次要 API 改进，提出了bytes.iterbytes()方法：

>>> list(b'123'.iterbytes())
[b'1', b'2', b'3']

Answer 2

int.to_bytes int.to_bytes

int objects have a to_bytes method which can be used to convert an int to its corresponding byte: int对象有一个to_bytes方法，可用于将 int 转换为其相应的字节：

>>> import sys
>>> [i.to_bytes(1, sys.byteorder) for i in b'123']
[b'1', b'2', b'3']

As with some other other answers, it's not clear that this is more readable than the OP's original solution: the length and byteorder arguments make it noisier I think.与其他一些答案一样，尚不清楚这是否比 OP 的原始解决方案更具可读性：我认为长度和字节顺序参数使它变得更加嘈杂。

struct.unpack结构体解压

Another approach would be to use struct.unpack , though this might also be considered difficult to read, unless you are familiar with the struct module:另一种方法是使用struct.unpack ，尽管这也可能被认为难以阅读，除非您熟悉 struct 模块：

>>> import struct
>>> struct.unpack('3c', b'123')
(b'1', b'2', b'3')

(As jfs observes in the comments, the format string for struct.unpack can be constructed dynamically; in this case we know the number of individual bytes in the result must equal the number of bytes in the original bytestring, so struct.unpack(str(len(bytestring)) + 'c', bytestring) is possible.) （正如 jfs 在注释中观察到的， struct.unpack的格式字符串可以动态构造；在这种情况下，我们知道结果中的单个字节数必须等于原始字节struct.unpack(str(len(bytestring)) + 'c', bytestring)中的字节数，因此struct.unpack(str(len(bytestring)) + 'c', bytestring)是可能的。）

Performance表现

>>> import random, timeit
>>> bs = bytes(random.randint(0, 255) for i in range(100))

>>> # OP's solution
>>> timeit.timeit(setup="from __main__ import bs",
                  stmt="[bytes([b]) for b in bs]")
46.49886950897053

>>> # Accepted answer from jfs
>>> timeit.timeit(setup="from __main__ import bs",
                  stmt="[bs[i:i+1] for i in range(len(bs))]")
20.91463226894848

>>>  # Leon's answer
>>> timeit.timeit(setup="from __main__ import bs", 
                  stmt="list(map(bytes, zip(bs)))")
27.476876026019454

>>> # guettli's answer
>>> timeit.timeit(setup="from __main__ import iter_bytes, bs",        
                  stmt="list(iter_bytes(bs))")
24.107485140906647

>>> # user38's answer (with Leon's suggested fix)
>>> timeit.timeit(setup="from __main__ import bs", 
                  stmt="[chr(i).encode('latin-1') for i in bs]")
45.937552741961554

>>> # Using int.to_bytes
>>> timeit.timeit(setup="from __main__ import bs;from sys import byteorder", 
                  stmt="[x.to_bytes(1, byteorder) for x in bs]")
32.197659170022234

>>> # Using struct.unpack, converting the resulting tuple to list
>>> # to be fair to other methods
>>> timeit.timeit(setup="from __main__ import bs;from struct import unpack", 
                  stmt="list(unpack('100c', bs))")
1.902243083808571

struct.unpack seems to be at least an order of magnitude faster than other methods, presumably because it operates at the byte level. struct.unpack似乎至少比其他方法快一个数量级，大概是因为它在字节级别运行。 int.to_bytes , on the other hand, performs worse than most of the "obvious" approaches.另一方面， int.to_bytes性能比大多数“明显”方法差。

Answer 3

I thought it might be useful to compare the runtimes of the different approaches so I made a benchmark (using my library simple_benchmark ):我认为比较不同方法的运行时间可能会很有用，所以我做了一个基准测试（使用我的库simple_benchmark ）：

Probably unsurprisingly the NumPy solution is by far the fastest solution for large bytes object.不出所料，NumPy 解决方案是迄今为止大字节对象最快的解决方案。

But if a resulting list is desired then both the NumPy solution (with the tolist() ) and the struct solution are much faster than the other alternatives.但是，如果需要结果列表，那么 NumPy 解决方案（使用tolist() ）和struct解决方案都比其他替代方案快得多。

I didn't include guettlis answer because it's almost identical to jfs solution just instead of a comprehension a generator function is used.我没有包含 guettlis 答案，因为它与 jfs 解决方案几乎相同，只是使用了生成器函数而不是理解。

import numpy as np
import struct
import sys

from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()

@b.add_function()
def jfs(bytes_obj):
    return [bytes_obj[i:i+1] for i in range(len(bytes_obj))]

@b.add_function()
def snakecharmerb_tobytes(bytes_obj):
    return [i.to_bytes(1, sys.byteorder) for i in bytes_obj]

@b.add_function()
def snakecharmerb_struct(bytes_obj):
    return struct.unpack(str(len(bytes_obj)) + 'c', bytes_obj)

@b.add_function()
def Leon(bytes_obj):
    return list(map(bytes, zip(bytes_obj)))

@b.add_function()
def rusu_ro1_format(bytes_obj):
    return [b'%c' % i for i in bytes_obj]

@b.add_function()
def rusu_ro1_numpy(bytes_obj):
    return np.frombuffer(bytes_obj, dtype='S1')

@b.add_function()
def rusu_ro1_numpy_tolist(bytes_obj):
    return np.frombuffer(bytes_obj, dtype='S1').tolist()

@b.add_function()
def User38(bytes_obj):
    return [chr(i).encode() for i in bytes_obj]

@b.add_arguments('byte object length')
def argument_provider():
    for exp in range(2, 18):
        size = 2**exp
        yield size, b'a' * size

r = b.run()
r.plot()

Answer 4

since python 3.5 you can use % formatting to bytes and bytearray :从 python 3.5 开始，你可以使用% 格式化到 bytes 和 bytearray ：

[b'%c' % i for i in b'123']

output:输出：

[b'1', b'2', b'3']

the above solution is 2-3 times faster than your initial approach, if you want a more fast solution I will suggest to use numpy.frombuffer :上述解决方案比您的初始方法快 2-3 倍，如果您想要更快的解决方案，我建议使用numpy.frombuffer ：

import numpy as np
np.frombuffer(b'123', dtype='S1')

output:输出：

array([b'1', b'2', b'3'], 
      dtype='|S1')

The second solution is ~10% faster than struct.unpack (I have used the same performance test as @snakecharmerb, against 100 random bytes)第二种解决方案比 struct.unpack 快约 10%（我使用了与 @snakecharmerb 相同的性能测试，针对 100 个随机字节）

Answer 5

A trio of map() , bytes() and zip() does the trick:三个map() 、 bytes()和zip()可以解决问题：

>>> list(map(bytes, zip(b'123')))
[b'1', b'2', b'3']

However I don't think that it is any more readable than [bytes([b]) for b in b'123'] or performs better.但是，我不认为它比[bytes([b]) for b in b'123']更具可读性或性能更好。

Answer 6

I use this helper method:我使用这个辅助方法：

def iter_bytes(my_bytes):
    for i in range(len(my_bytes)):
        yield my_bytes[i:i+1]

Works for Python2 and Python3.适用于 Python2 和 Python3。

Answer 7

一个简单的方法来做到这一点：

[bytes([i]) for i in b'123\xaa\xbb\xcc\xff']

在 Python 3 中迭代单个字节

问题描述

7 个解决方案

解决方案1
45 已采纳 2013-01-10 21:53:00

解决方案2
18 2019-08-18 10:08:12

解决方案3
12 2019-08-20 14:20:37

解决方案4
10 2019-08-19 19:28:34

解决方案5
7 2019-08-14 20:34:45

解决方案6
6 2019-08-14 10:46:07

解决方案7
1 2019-08-18 00:22:35

在 Python 3 中迭代单个字节

问题描述

7 个解决方案

解决方案1 45 已采纳 2013-01-10 21:53:00

解决方案2 18 2019-08-18 10:08:12

解决方案3 12 2019-08-20 14:20:37

解决方案4 10 2019-08-19 19:28:34

解决方案5 7 2019-08-14 20:34:45

解决方案6 6 2019-08-14 10:46:07

解决方案7 1 2019-08-18 00:22:35

解决方案1
45 已采纳 2013-01-10 21:53:00

解决方案2
18 2019-08-18 10:08:12

解决方案3
12 2019-08-20 14:20:37

解决方案4
10 2019-08-19 19:28:34

解决方案5
7 2019-08-14 20:34:45

解决方案6
6 2019-08-14 10:46:07

解决方案7
1 2019-08-18 00:22:35