在python中将8位列表转换为32位整数数组

Question

what I have : 我有的：

textdata = "this is my test data"
DataArray = [ord(c) for c in textdata]

now I want to transform this is into x 32 bit integer by combining 4 elements of the list together 现在我想通过将列表的4个元素组合在一起将其转换为x 32位整数

Ex : DataArray[0:4] would become a 32 bit integer, and then iterate to the next 4 elements and do the same. 例如：DataArray [0：4]将成为32位整数，然后迭代到接下来的4个元素并执行相同的操作。 In the end, I would have a 32-bit array with all my results in it. 最后，我将得到一个包含所有结果的32位数组。

How can I do this in python whitout iterating over the whole string. 我如何在python遍历整个字符串的过程中做到这一点。 Is there a simple way to do this? 有没有简单的方法可以做到这一点？

Answer 1

As long as your string is an integer multiple of 4, you can use NumPy in a very efficient way: 只要您的字符串是4的整数倍，就可以以非常有效的方式使用NumPy：

import numpy as np
data = np.fromstring(textdata, dtype='>u4')
# array([1952999795,  543781664, 1836654708, 1702065184, 1684108385])

'>u4' means 'big-endian unsigned 4-byte integer'. '>u4'表示“大端无符号4字节整数”。

Edit : If you use NumPy >= 1.14, then np.fromstring is deprecated, and the right way to process your text is by calling np.frombuffer(textdata.encode(), dtype='>u4') . 编辑：如果您使用NumPy> = 1.14，则不np.fromstring使用np.fromstring ，处理文本的正确方法是调用np.frombuffer(textdata.encode(), dtype='>u4') 。

Answer 2

Using numpy: 使用numpy：

>>> import numpy as np

>>> a = np.frombuffer(b'this is my test data', dtype=np.int32)
>>> a
array([1936287860,  544434464, 1948285293,  544502629, 1635017060], dtype=int32)
>>> a.tobytes()
b'this is my test data'

Use '<i4' or similar as dtype for endianness that's portable between machines. 使用'<i4'或类似的dtype表示在机器之间可移植的字节序。

I'm assuming that you can keep your initial data as bytes rather than unicode , because you really should try hard to do that. 我假设您可以将初始数据保留为bytes而不是unicode ，因为您确实应该努力做到这一点。

Answer 3

You can use the struct built-in python module: 您可以使用struct内置的python模块：

from struct import unpack

textdata = "this is my test data"
data = list(unpack('i'*(len(textdata)//4), textdata))

Result: 结果：

[1936287860, 544434464, 1948285293, 544502629, 1635017060]

You won't need to iterate over the string and you can find other Format Characters if you want to use unsigned integers for example. 例如，如果要使用无符号整数，则不需要遍历字符串，并且可以找到其他格式字符。

Answer 4

You could use something like the following, which uses bit manipulation (big-endian): 您可以使用类似以下的东西，它使用位操作（大端）：

def chunk2int(chunk):
    """ Converts a chunk (string) into an int, 8 bits per character """
    val = 0
    for c in chunk:
        val = (val << 8) | (ord(c) & 0xFF)
    return val

def int2chunk(val):
    """ Converts an int into a chunk, consuming 8 bits per character """
    rchunk = []
    while val:
        rchunk.append(val & 0xFF)
        val >>= 8

    return ''.join(chr(c) for c in reversed(rchunk))

textdata = "this is my test data"

chunks = [textdata[i:i + 4] for i in range(0, len(textdata), 4)]
print(chunks)

data = [chunk2int(c) for c in chunks]
print(data)

chunks = [int2chunk(d) for d in data]
print(chunks)

Produces: 生产：

['this', ' is ', 'my t', 'est ', 'data']
[1952999795, 543781664, 1836654708, 1702065184, 1684108385]
['this', ' is ', 'my t', 'est ', 'data']

If you're using characters with 1 <= ord(c) <= 255 in your input text, this will work. 如果您在输入文本中使用1 <= ord(c) <= 255的字符，则可以使用。 If there are null bytes in your string, the int2chunk method may terminate early, in which case you'd have to pad the chunks. 如果您的字符串中int2chunk字节，则int2chunk方法可能会提前终止，在这种情况下，您必须填充这些块。

There's also the struct module, which may be worth looking into, and where you can change the endianness much more simply. 还有struct模块，可能值得研究，在这里您可以更简单地更改字节序。

在python中将8位列表转换为32位整数数组

问题描述

4 个解决方案

解决方案1
1 2018-07-04 22:15:25

解决方案2
1 已采纳 2018-07-04 22:15:33

解决方案3
1 2018-07-04 22:17:25

解决方案4
0 2018-07-04 22:13:29

在python中将8位列表转换为32位整数数组

问题描述

4 个解决方案

解决方案1 1 2018-07-04 22:15:25

解决方案2 1 已采纳 2018-07-04 22:15:33

解决方案3 1 2018-07-04 22:17:25

解决方案4 0 2018-07-04 22:13:29

解决方案1
1 2018-07-04 22:15:25

解决方案2
1 已采纳 2018-07-04 22:15:33

解决方案3
1 2018-07-04 22:17:25

解决方案4
0 2018-07-04 22:13:29