[英]Iterate over individual bytes in Python 3
When iterating over a bytes
object in Python 3, one gets the individual bytes
as ints
:在 Python 3 中迭代
bytes
对象时,将单个bytes
作为ints
:
>>> [b for b in b'123']
[49, 50, 51]
How to get 1-length bytes
objects instead?如何获取 1 长度的
bytes
对象?
The following is possible, but not very obvious for the reader and most likely performs bad:以下是可能的,但对读者来说不是很明显,并且很可能表现不佳:
>>> [bytes([b]) for b in b'123']
[b'1', b'2', b'3']
If you are concerned about performance of this code and an int
as a byte is not suitable interface in your case then you should probably reconsider data structures that you use eg, use str
objects instead.如果您担心此代码的性能,并且
int
作为字节在您的情况下不适合接口,那么您可能应该重新考虑您使用的数据结构,例如,改用str
对象。
You could slice the bytes
object to get 1-length bytes
objects:您可以对
bytes
对象进行切片以获取 1 长度的bytes
对象:
L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
There is PEP 0467 -- Minor API improvements for binary sequences that proposes bytes.iterbytes()
method:有PEP
bytes.iterbytes()
二进制序列的次要 API 改进,提出了bytes.iterbytes()
方法:
>>> list(b'123'.iterbytes())
[b'1', b'2', b'3']
int.to_bytes int.to_bytes
int
objects have a to_bytes method which can be used to convert an int to its corresponding byte: int
对象有一个to_bytes方法,可用于将 int 转换为其相应的字节:
>>> import sys
>>> [i.to_bytes(1, sys.byteorder) for i in b'123']
[b'1', b'2', b'3']
As with some other other answers, it's not clear that this is more readable than the OP's original solution: the length and byteorder arguments make it noisier I think.与其他一些答案一样,尚不清楚这是否比 OP 的原始解决方案更具可读性:我认为长度和字节顺序参数使它变得更加嘈杂。
struct.unpack结构体解压
Another approach would be to use struct.unpack , though this might also be considered difficult to read, unless you are familiar with the struct module:另一种方法是使用struct.unpack ,尽管这也可能被认为难以阅读,除非您熟悉 struct 模块:
>>> import struct
>>> struct.unpack('3c', b'123')
(b'1', b'2', b'3')
(As jfs observes in the comments, the format string for struct.unpack
can be constructed dynamically; in this case we know the number of individual bytes in the result must equal the number of bytes in the original bytestring, so struct.unpack(str(len(bytestring)) + 'c', bytestring)
is possible.) (正如 jfs 在注释中观察到的,
struct.unpack
的格式字符串可以动态构造;在这种情况下,我们知道结果中的单个字节数必须等于原始字节struct.unpack(str(len(bytestring)) + 'c', bytestring)
中的字节数,因此struct.unpack(str(len(bytestring)) + 'c', bytestring)
是可能的。)
Performance表现
>>> import random, timeit
>>> bs = bytes(random.randint(0, 255) for i in range(100))
>>> # OP's solution
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[bytes([b]) for b in bs]")
46.49886950897053
>>> # Accepted answer from jfs
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[bs[i:i+1] for i in range(len(bs))]")
20.91463226894848
>>> # Leon's answer
>>> timeit.timeit(setup="from __main__ import bs",
stmt="list(map(bytes, zip(bs)))")
27.476876026019454
>>> # guettli's answer
>>> timeit.timeit(setup="from __main__ import iter_bytes, bs",
stmt="list(iter_bytes(bs))")
24.107485140906647
>>> # user38's answer (with Leon's suggested fix)
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[chr(i).encode('latin-1') for i in bs]")
45.937552741961554
>>> # Using int.to_bytes
>>> timeit.timeit(setup="from __main__ import bs;from sys import byteorder",
stmt="[x.to_bytes(1, byteorder) for x in bs]")
32.197659170022234
>>> # Using struct.unpack, converting the resulting tuple to list
>>> # to be fair to other methods
>>> timeit.timeit(setup="from __main__ import bs;from struct import unpack",
stmt="list(unpack('100c', bs))")
1.902243083808571
struct.unpack
seems to be at least an order of magnitude faster than other methods, presumably because it operates at the byte level. struct.unpack
似乎至少比其他方法快一个数量级,大概是因为它在字节级别运行。 int.to_bytes
, on the other hand, performs worse than most of the "obvious" approaches.另一方面,
int.to_bytes
性能比大多数“明显”方法差。
I thought it might be useful to compare the runtimes of the different approaches so I made a benchmark (using my library simple_benchmark
):我认为比较不同方法的运行时间可能会很有用,所以我做了一个基准测试(使用我的库
simple_benchmark
):
Probably unsurprisingly the NumPy solution is by far the fastest solution for large bytes object.不出所料,NumPy 解决方案是迄今为止大字节对象最快的解决方案。
But if a resulting list is desired then both the NumPy solution (with the tolist()
) and the struct
solution are much faster than the other alternatives.但是,如果需要结果列表,那么 NumPy 解决方案(使用
tolist()
)和struct
解决方案都比其他替代方案快得多。
I didn't include guettlis answer because it's almost identical to jfs solution just instead of a comprehension a generator function is used.我没有包含 guettlis 答案,因为它与 jfs 解决方案几乎相同,只是使用了生成器函数而不是理解。
import numpy as np
import struct
import sys
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
@b.add_function()
def jfs(bytes_obj):
return [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
@b.add_function()
def snakecharmerb_tobytes(bytes_obj):
return [i.to_bytes(1, sys.byteorder) for i in bytes_obj]
@b.add_function()
def snakecharmerb_struct(bytes_obj):
return struct.unpack(str(len(bytes_obj)) + 'c', bytes_obj)
@b.add_function()
def Leon(bytes_obj):
return list(map(bytes, zip(bytes_obj)))
@b.add_function()
def rusu_ro1_format(bytes_obj):
return [b'%c' % i for i in bytes_obj]
@b.add_function()
def rusu_ro1_numpy(bytes_obj):
return np.frombuffer(bytes_obj, dtype='S1')
@b.add_function()
def rusu_ro1_numpy_tolist(bytes_obj):
return np.frombuffer(bytes_obj, dtype='S1').tolist()
@b.add_function()
def User38(bytes_obj):
return [chr(i).encode() for i in bytes_obj]
@b.add_arguments('byte object length')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, b'a' * size
r = b.run()
r.plot()
since python 3.5 you can use % formatting to bytes and bytearray :从 python 3.5 开始,你可以使用% 格式化到 bytes 和 bytearray :
[b'%c' % i for i in b'123']
output:输出:
[b'1', b'2', b'3']
the above solution is 2-3 times faster than your initial approach, if you want a more fast solution I will suggest to use numpy.frombuffer :上述解决方案比您的初始方法快 2-3 倍,如果您想要更快的解决方案,我建议使用numpy.frombuffer :
import numpy as np
np.frombuffer(b'123', dtype='S1')
output:输出:
array([b'1', b'2', b'3'],
dtype='|S1')
The second solution is ~10% faster than struct.unpack (I have used the same performance test as @snakecharmerb, against 100 random bytes)第二种解决方案比 struct.unpack 快约 10%(我使用了与 @snakecharmerb 相同的性能测试,针对 100 个随机字节)
A trio of map()
, bytes()
and zip()
does the trick:三个
map()
、 bytes()
和zip()
可以解决问题:
>>> list(map(bytes, zip(b'123')))
[b'1', b'2', b'3']
However I don't think that it is any more readable than [bytes([b]) for b in b'123']
or performs better.但是,我不认为它比
[bytes([b]) for b in b'123']
更具可读性或性能更好。
I use this helper method:我使用这个辅助方法:
def iter_bytes(my_bytes):
for i in range(len(my_bytes)):
yield my_bytes[i:i+1]
Works for Python2 and Python3.适用于 Python2 和 Python3。
一个简单的方法来做到这一点:
[bytes([i]) for i in b'123\xaa\xbb\xcc\xff']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.