简体   繁体   English

在 Python 3 中将 int 转换为字节

[英]Converting int to bytes in Python 3

I was trying to build this bytes object in Python 3:我试图在 Python 3 中构建这个字节 object:

b'3\r\n'

so I tried the obvious (for me), and found a weird behaviour:所以我尝试了明显的(对我来说),并发现了一个奇怪的行为:

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

Apparently:显然:

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

I've been unable to see any pointers on why the bytes conversion works this way reading the documentation.在阅读文档时,我一直无法看到任何关于字节转换为何以这种方式工作的指针。 However, I did find some surprise messages in this Python issue about adding format to bytes (see also Python 3 bytes formatting ):但是,我确实在此 Python 问题中发现了一些关于向字节添加format的令人惊讶的消息(另请参阅Python 3 字节格式):

http://bugs.python.org/issue3982 http://bugs.python.org/issue3982

This interacts even more poorly with oddities like bytes(int) returning zeroes now这与现在返回零的字节(int)之类的奇怪交互更差

and:和:

It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int;如果 bytes(int) 返回该 int 的 ASCII 化,对我来说会更方便; but honestly, even an error would be better than this behavior.但老实说,即使是错误也会比这种行为更好。 (If I wanted this behavior - which I never have - I'd rather it be a classmethod, invoked like "bytes.zeroes(n)".) (如果我想要这种行为——我从来没有过——我宁愿它是一个类方法,像“bytes.zeroes(n)”一样调用。)

Can someone explain me where this behaviour comes from?有人可以解释一下这种行为的来源吗?

From python 3.2 you can do从python 3.2你可以做

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'

https://docs.python.org/3/library/stdtypes.html#int.to_byteshttps://docs.python.org/3/library/stdtypes.html#int.to_bytes

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')
    
def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

Accordingly, x == int_from_bytes(int_to_bytes(x)) .因此, x == int_from_bytes(int_to_bytes(x)) Note that the above encoding works only for unsigned (non-negative) integers.请注意,上述编码仅适用于无符号(非负)整数。

For signed integers, the bit length is a bit more tricky to calculate:对于有符号整数,位长的计算有点棘手:

def int_to_bytes(number: int) -> bytes:
    return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)

def int_from_bytes(binary_data: bytes) -> Optional[int]:
    return int.from_bytes(binary_data, byteorder='big', signed=True)

That's the way it was designed - and it makes sense because usually, you would call bytes on an iterable instead of a single integer:这就是它的设计方式 - 这是有道理的,因为通常,您会在可迭代而不是单个整数上调用bytes

>>> bytes([3])
b'\x03'

The docs state this , as well as the docstring for bytes : 文档说明了这一点,以及bytes的文档字符串:

 >>> help(bytes)
 ...
 bytes(int) -> bytes object of size given by the parameter initialized with null bytes

You can use the struct's pack :您可以使用结构包

In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'

The ">" is the byte-order (big-endian) and the "I" is theformat character . ">" 是字节顺序 (big-endian) ,而 "I" 是格式字符 So you can be specific if you want to do something else:因此,如果您想做其他事情,则可以具体说明:

In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'

In [13]: struct.pack("B", 1)
Out[13]: '\x01'

This works the same on both python 2 and python 3 .这在 python 2 和python 3上的工作原理相同。

Note: the inverse operation (bytes to int) can be done with unpack .注意:逆操作(字节到整数)可以用unpack完成。

Python 3.5+ introduces %-interpolation ( printf -style formatting) for bytes : Python 3.5+ 为字节引入了 %-interpolation( printf样式格式)

>>> b'%d\r\n' % 3
b'3\r\n'

See PEP 0461 -- Adding % formatting to bytes and bytearray .请参阅PEP 0461 - 向字节和字节数组添加 % 格式

On earlier versions, you could use str and .encode('ascii') the result:在早期版本中,您可以使用str.encode('ascii')结果:

>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'

Note: It is different from what int.to_bytes produces :注意:它与int.to_bytes产生的不同:

>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != '\x03'
True

The documentation says:文档说:

bytes(int) -> bytes object of size given by the parameter
              initialized with null bytes

The sequence:序列:

b'3\r\n'

It is the character '3' (decimal 51) the character '\\r' (13) and '\\n' (10).它是字符'3'(十进制51)、字符'\\r'(13)和'\\n'(10)。

Therefore, the way would treat it as such, for example:因此,该方式会这样对待它,例如:

>>> bytes([51, 13, 10])
b'3\r\n'

>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'

>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'

Tested on IPython 1.1.0 & Python 3.2.3在 IPython 1.1.0 和 Python 3.2.3 上测试

The ASCIIfication of 3 is "\\x33" not "\\x03" ! 3 的 ASCII 化是"\\x33"不是"\\x03"

That is what python does for str(3) but it would be totally wrong for bytes, as they should be considered arrays of binary data and not be abused as strings.这就是 python 对str(3)所做的,但对于字节来说,这是完全错误的,因为它们应该被视为二进制数据数组,而不是被滥用为字符串。

The most easy way to achieve what you want is bytes((3,)) , which is better than bytes([3]) because initializing a list is much more expensive, so never use lists when you can use tuples.实现您想要的最简单的方法是bytes((3,)) ,它比bytes([3])更好,因为初始化列表要昂贵得多,所以当您可以使用元组时永远不要使用列表。 You can convert bigger integers by using int.to_bytes(3, "little") .您可以使用int.to_bytes(3, "little")转换更大的整数。

Initializing bytes with a given length makes sense and is the most useful, as they are often used to create some type of buffer for which you need some memory of given size allocated.使用给定长度初始化字节是有意义的,也是最有用的,因为它们通常用于创建某种类型的缓冲区,您需要为其分配一些给定大小的内存。 I often use this when initializing arrays or expanding some file by writing zeros to it.我经常在初始化数组或通过向其写入零来扩展某些文件时使用它。

int (including Python2's long ) can be converted to bytes using following function: int (包括 Python2 的long )可以使用以下函数转换为bytes

import codecs

def int2bytes(i):
    hex_value = '{0:x}'.format(i)
    # make length of hex_value a multiple of two
    hex_value = '0' * (len(hex_value) % 2) + hex_value
    return codecs.decode(hex_value, 'hex_codec')

The reverse conversion can be done by another one:反向转换可以由另一个完成:

import codecs
import six  # should be installed via 'pip install six'

long = six.integer_types[-1]

def bytes2int(b):
    return long(codecs.encode(b, 'hex_codec'), 16)

Both functions work on both Python2 and Python3.这两个函数都适用于 Python2 和 Python3。

I was curious about performance of various methods for a single int in the range [0, 255] , so I decided to do some timing tests.我对[0, 255]范围内的单个 int 的各种方法的性能很好奇,所以我决定做一些计时测试。

Based on the timings below, and from the general trend I observed from trying many different values and configurations, struct.pack seems to be the fastest, followed by int.to_bytes , bytes , and with str.encode (unsurprisingly) being the slowest.根据下面的时间,以及从我尝试许多不同的值和配置所观察到的总体趋势, struct.pack似乎是最快的,其次是int.to_bytesbytes ,而str.encodestr.encode )是最慢的。 Note that the results show some more variation than is represented, and int.to_bytes and bytes sometimes switched speed ranking during testing, but struct.pack is clearly the fastest.请注意,结果显示的变化比表示的要多,并且int.to_bytesbytes在测试期间有时会切换速度排名,但struct.pack显然是最快的。

Results in CPython 3.7 on Windows: Windows 上的 CPython 3.7 结果:

Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop

Test module (named int_to_byte.py ):测试模块(名为int_to_byte.py ):

"""Functions for converting a single int to a bytes object with that int's value."""

import random
import shlex
import struct
import timeit

def bytes_(i):
    """From Tim Pietzcker's answer:
    https://stackoverflow.com/a/21017834/8117067
    """
    return bytes([i])

def to_bytes(i):
    """From brunsgaard's answer:
    https://stackoverflow.com/a/30375198/8117067
    """
    return i.to_bytes(1, byteorder='big')

def struct_pack(i):
    """From Andy Hayden's answer:
    https://stackoverflow.com/a/26920966/8117067
    """
    return struct.pack('B', i)

# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067

def chr_encode(i):
    """Another method, from Quuxplusone's answer here:
    https://codereview.stackexchange.com/a/210789/140921

    Similar to g10guang's answer:
    https://stackoverflow.com/a/51558790/8117067
    """
    return chr(i).encode('latin1')

converters = [bytes_, to_bytes, struct_pack, chr_encode]

def one_byte_equality_test():
    """Test that results are identical for ints in the range [0, 255]."""
    for i in range(256):
        results = [c(i) for c in converters]
        # Test that all results are equal
        start = results[0]
        if any(start != b for b in results):
            raise ValueError(results)

def timing_tests(value=None):
    """Test each of the functions with a random int."""
    if value is None:
        # random.randint takes more time than int to byte conversion
        # so it can't be a part of the timeit call
        value = random.randint(0, 255)
    print(f'Testing with {value}:')
    for c in converters:
        print(f'{c.__name__}: ', end='')
        # Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
        timeit.main(args=shlex.split(
            f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
            f"'{c.__name__}(value)'"
        ))

Although the prior answer by brunsgaard is an efficient encoding, it works only for unsigned integers.尽管brunsgaard的先前答案是一种有效的编码,但它仅适用于无符号整数。 This one builds upon it to work for both signed and unsigned integers.这个建立在它的基础上,适用于有符号和无符号整数。

def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
    length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
    return i.to_bytes(length, byteorder='big', signed=signed)

def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
    return int.from_bytes(b, byteorder='big', signed=signed)

# Test unsigned:
for i in range(1025):
    assert i == bytes_to_int(int_to_bytes(i))

# Test signed:
for i in range(-1024, 1025):
    assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)

For the encoder, (i + ((i * signed) < 0)).bit_length() is used instead of just i.bit_length() because the latter leads to an inefficient encoding of -128, -32768, etc.对于编码器,使用(i + ((i * signed) < 0)).bit_length()而不是i.bit_length()因为后者导致 -128、-32768 等的低效编码。


Credit: CervEd for fixing a minor inefficiency.信用:CervEd 修复了一个小问题。

From bytes docs :来自字节文档

Accordingly, constructor arguments are interpreted as for bytearray().因此,构造函数参数被解释为 bytearray()。

Then, from bytearray docs :然后,从bytearray 文档

The optional source parameter can be used to initialize the array in a few different ways:可选的 source 参数可用于以几种不同的方式初始化数组:

  • If it is an integer, the array will have that size and will be initialized with null bytes.如果它是一个整数,则该数组将具有该大小并使用空字节进行初始化。

Note, that differs from 2.x (where x >= 6) behavior, where bytes is simply str :请注意,这与 2.x (其中 x >= 6)行为不同,其中bytes只是str

>>> bytes is str
True

PEP 3112 : PEP 3112

The 2.6 str differs from 3.0's bytes type in various ways; 2.6 的 str 与 3.0 的字节类型有很多不同; most notably, the constructor is completely different.最值得注意的是,构造函数完全不同。

The behaviour comes from the fact that in Python prior to version 3 bytes was just an alias for str .该行为来自这样一个事实,即在 Python 3 之前的版本中, bytes只是str的别名。 In Python3.x bytes is an immutable version of bytearray - completely new type, not backwards compatible.在Python3.x bytes是一个不可改变的版本bytearray -全新的类型,而不是向后兼容。

Some answers don't work with large numbers.有些答案不适用于大数字。

Convert integer to the hex representation, then convert it to bytes:将整数转换为十六进制表示,然后将其转换为字节:

def int_to_bytes(number):
    hrepr = hex(number).replace('0x', '')
    if len(hrepr) % 2 == 1:
        hrepr = '0' + hrepr
    return bytes.fromhex(hrepr)

Result:结果:

>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

As you want to deal with binary representation, the best is to use ctypes .当您要处理二进制表示时,最好使用ctypes

import ctypes
x = ctypes.c_int(1234)
bytes(x)

You must use the specific integer representation (signed/unsigned and the number of bits: c_uint8 , c_int8 , c_unit16 ,...).您必须使用特定的整数表示(有符号/无符号和位数: c_uint8c_int8c_unit16 ,...)。

I think you can convert the int to str first, before you convert to byte.我认为您可以先将 int 转换为 str,然后再转换为 byte。 That should produce the format you want.那应该会产生你想要的格式。

bytes(str(your_number),'UTF-8') + b'\r\n'

It works for me in py3.8.它在 py3.8 中对我有用。

>>> chr(116).encode()
b't'

If you don't care about the performance, you can convert an int to str first. 如果您不关心性能,可以先将int转换为str。

number = 1024
str(number).encode()

If the question is how to convert an integer itself (not its string equivalent) into bytes, I think the robust answer is:如果问题是如何将整数本身(而不是等效的字符串)转换为字节,我认为可靠的答案是:

>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5

More information on these methods here:有关这些方法的更多信息,请访问:

  1. https://docs.python.org/3.8/library/stdtypes.html#int.to_bytes https://docs.python.org/3.8/library/stdtypes.html#int.to_bytes
  2. https://docs.python.org/3.8/library/stdtypes.html#int.from_bytes https://docs.python.org/3.8/library/stdtypes.html#int.from_bytes

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM