简体   繁体   English

在python中将8位列表转换为32位整数数组

[英]Convert a 8bit list to a 32 bit integer array in python

what I have : 我有的 :

textdata = "this is my test data"
DataArray = [ord(c) for c in textdata]

now I want to transform this is into x 32 bit integer by combining 4 elements of the list together 现在我想通过将列表的4个元素组合在一起将其转换为x 32位整数

Ex : DataArray[0:4] would become a 32 bit integer, and then iterate to the next 4 elements and do the same. 例如:DataArray [0:4]将成为32位整数,然后迭代到接下来的4个元素并执行相同的操作。 In the end, I would have a 32-bit array with all my results in it. 最后,我将得到一个包含所有结果的32位数组。

How can I do this in python whitout iterating over the whole string. 我如何在python遍历整个字符串的过程中做到这一点。 Is there a simple way to do this? 有没有简单的方法可以做到这一点?

As long as your string is an integer multiple of 4, you can use NumPy in a very efficient way: 只要您的字符串是4的整数倍,就可以以非常有效的方式使用NumPy:

import numpy as np
data = np.fromstring(textdata, dtype='>u4')
# array([1952999795,  543781664, 1836654708, 1702065184, 1684108385])

'>u4' means 'big-endian unsigned 4-byte integer'. '>u4'表示“大端无符号4字节整数”。

Edit : If you use NumPy >= 1.14, then np.fromstring is deprecated, and the right way to process your text is by calling np.frombuffer(textdata.encode(), dtype='>u4') . 编辑 :如果您使用NumPy> = 1.14,则不np.fromstring使用np.fromstring ,处理文本的正确方法是调用np.frombuffer(textdata.encode(), dtype='>u4')

Using numpy: 使用numpy:

>>> import numpy as np

>>> a = np.frombuffer(b'this is my test data', dtype=np.int32)
>>> a
array([1936287860,  544434464, 1948285293,  544502629, 1635017060], dtype=int32)
>>> a.tobytes()
b'this is my test data'

Use '<i4' or similar as dtype for endianness that's portable between machines. 使用'<i4'或类似的dtype表示在机器之间可移植的字节序。

I'm assuming that you can keep your initial data as bytes rather than unicode , because you really should try hard to do that. 我假设您可以将初始数据保留为bytes而不是unicode ,因为您确实应该努力做到这一点。

You can use the struct built-in python module: 您可以使用struct内置的python模块:

from struct import unpack

textdata = "this is my test data"
data = list(unpack('i'*(len(textdata)//4), textdata))

Result: 结果:

[1936287860, 544434464, 1948285293, 544502629, 1635017060]

You won't need to iterate over the string and you can find other Format Characters if you want to use unsigned integers for example. 例如,如果要使用无符号整数,则不需要遍历字符串,并且可以找到其他格式字符

You could use something like the following, which uses bit manipulation (big-endian): 您可以使用类似以下的东西,它使用位操作(大端):

def chunk2int(chunk):
    """ Converts a chunk (string) into an int, 8 bits per character """
    val = 0
    for c in chunk:
        val = (val << 8) | (ord(c) & 0xFF)
    return val

def int2chunk(val):
    """ Converts an int into a chunk, consuming 8 bits per character """
    rchunk = []
    while val:
        rchunk.append(val & 0xFF)
        val >>= 8

    return ''.join(chr(c) for c in reversed(rchunk))

textdata = "this is my test data"

chunks = [textdata[i:i + 4] for i in range(0, len(textdata), 4)]
print(chunks)

data = [chunk2int(c) for c in chunks]
print(data)

chunks = [int2chunk(d) for d in data]
print(chunks)

Produces: 生产:

['this', ' is ', 'my t', 'est ', 'data']
[1952999795, 543781664, 1836654708, 1702065184, 1684108385]
['this', ' is ', 'my t', 'est ', 'data']

If you're using characters with 1 <= ord(c) <= 255 in your input text, this will work. 如果您在输入文本中使用1 <= ord(c) <= 255的字符,则可以使用。 If there are null bytes in your string, the int2chunk method may terminate early, in which case you'd have to pad the chunks. 如果您的字符串中int2chunk字节,则int2chunk方法可能会提前终止,在这种情况下,您必须填充这些块。

There's also the struct module, which may be worth looking into, and where you can change the endianness much more simply. 还有struct模块,可能值得研究,在这里您可以更简单地更改字节序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM