简体   繁体   English

将numpy字符串字段数组转换为数字格式

[英]converting numpy array of string fields to numerical format

I have an array of strings grouped into three fields: 我有一个字符串数组,分为三个字段:

x = np.array([(-1, 0, 1),
              (-1, 1, 0),
              (0, 1, -1),
              (0, -1, 1)],
             dtype=[('a', 'S2'),
                    ('b', 'S2'),
                    ('c', 'S2')])

I would like to convert to a numerical array (of type np.int8 for a preference, but not required), shaped 4x3, instead of the fields. 我想转换为数字数组(类型为np.int8的首选项,但不是必需的),形状为4x3,而不是字段。

My general approach is to transform into a 4x3 array of type 'S2', then use astype to make it numerical. 我的一般方法是转换为'S2'类型的4x3数组,然后使用astype使其成为数字。 The only problem is that the only approach I can think of involves both view and np.lib.stride_tricks.as_strided , which doesn't seem like a very robust solution: 唯一的问题是我能想到的唯一方法涉及viewnp.lib.stride_tricks.as_strided ,这似乎不是一个非常强大的解决方案:

y = np.lib.stride_tricks.as_strided(x.view(dtype='S2'),
                                    shape=(4, 3), strides=(6, 2))
z = y.astype(np.int8)

This works for the toy case shown here, but I feel like there must be a simpler way to unpack an array with fields all having the same dtype. 这适用于此处显示的玩具箱,但我觉得必须有一种更简单的方法来解压缩具有所有具有相同dtype的字段的数组。 What is a more robust alternative? 什么是更强大的替代方案?

The latest version of numpy 1.16 added structured_to_unstructured which solves this purpose: 最新版本的numpy 1.16添加了structured_to_unstructured ,它解决了这个问题:

from numpy.lib.recfunctions import structured_to_unstructured
y = structured_to_unstructured(x)  # 2d array of 'S2'
z = y.astype(np.int8)

In previous version of numpy, you can combine x.data and np.frombuffer to create another array from the same data in memory without having to use strides. 在以前的numpy版本中,您可以将x.datanp.frombuffer组合x.data ,从内存中的相同数据创建另一个数组,而不必使用步幅。 It doesn't bring performance gain though, as the computation is driven by the casting from S2 to int8 . 但它并没有带来性能提升,因为计算是由从S2int8的转换驱动的。

n = 1000

def f1(x):
    y = np.lib.stride_tricks.as_strided(x.view(dtype='S2'),
                                        shape=(n, 3),
                                        strides=(6, 2))
    return y.astype(np.int8)

def f2(x):
    y = np.frombuffer(x.data, dtype='S2').reshape((n, 3))
    return y.astype(np.int8)


x = np.array([(i%3-1, (i+1)%3-1, (i+2)%3-1)
              for i in xrange(n)],
             dtype='S2,S2,S2')

z1 = f1(x)
z2 = f2(x)
assert (z1==z2).all()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM