[英]Numpy: Fix array with rows of different lengths by filling the empty elements with zeros
The functionality I am looking for looks something like this:我正在寻找的功能看起来像这样:
data = np.array([[1, 2, 3, 4],
[2, 3, 1],
[5, 5, 5, 5],
[1, 1]])
result = fix(data)
print result
[[ 1. 2. 3. 4.]
[ 2. 3. 1. 0.]
[ 5. 5. 5. 5.]
[ 1. 1. 0. 0.]]
These data arrays I'm working with are really large so I would really appreciate the most efficient solution.我正在使用的这些数据数组非常大,因此我非常感谢最有效的解决方案。
Edit: Data is read in from disk as a python list of lists.编辑:从磁盘读取数据作为 python 列表列表。
This could be one approach -这可能是一种方法 -
def numpy_fillna(data):
# Get lengths of each row of data
lens = np.array([len(i) for i in data])
# Mask of valid places in each row
mask = np.arange(lens.max()) < lens[:,None]
# Setup output array and put elements from data into masked positions
out = np.zeros(mask.shape, dtype=data.dtype)
out[mask] = np.concatenate(data)
return out
Sample input, output -样本输入、输出 -
In [222]: # Input object dtype array
...: data = np.array([[1, 2, 3, 4],
...: [2, 3, 1],
...: [5, 5, 5, 5, 8 ,9 ,5],
...: [1, 1]])
In [223]: numpy_fillna(data)
Out[223]:
array([[1, 2, 3, 4, 0, 0, 0],
[2, 3, 1, 0, 0, 0, 0],
[5, 5, 5, 5, 8, 9, 5],
[1, 1, 0, 0, 0, 0, 0]], dtype=object)
You could use pandas instead of numpy:您可以使用Pandas而不是 numpy:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame([[1, 2, 3, 4],
...: [2, 3, 1],
...: [5, 5, 5, 5],
...: [1, 1]], dtype=float)
In [3]: df.fillna(0.0).values
Out[3]:
array([[ 1., 2., 3., 4.],
[ 2., 3., 1., 0.],
[ 5., 5., 5., 5.],
[ 1., 1., 0., 0.]])
use np.pad()
.使用np.pad()
。
In [62]: arr
Out[62]:
[array([0]),
array([83, 74]),
array([87, 61, 23]),
array([71, 3, 81, 77]),
array([20, 44, 20, 53, 60]),
array([54, 36, 74, 35, 49, 54]),
array([11, 36, 0, 98, 29, 87, 21]),
array([ 1, 22, 62, 51, 45, 40, 36, 86]),
array([ 7, 22, 83, 58, 43, 59, 45, 81, 92]),
array([68, 78, 70, 67, 77, 64, 58, 88, 13, 56])]
In [63]: max_len = np.max([len(a) for a in arr])
In [64]: np.asarray([np.pad(a, (0, max_len - len(a)), 'constant', constant_values=0) for a in arr])
Out[64]:
array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[83, 74, 0, 0, 0, 0, 0, 0, 0, 0],
[87, 61, 23, 0, 0, 0, 0, 0, 0, 0],
[71, 3, 81, 77, 0, 0, 0, 0, 0, 0],
[20, 44, 20, 53, 60, 0, 0, 0, 0, 0],
[54, 36, 74, 35, 49, 54, 0, 0, 0, 0],
[11, 36, 0, 98, 29, 87, 21, 0, 0, 0],
[ 1, 22, 62, 51, 45, 40, 36, 86, 0, 0],
[ 7, 22, 83, 58, 43, 59, 45, 81, 92, 0],
[68, 78, 70, 67, 77, 64, 58, 88, 13, 56]])
This would be nice if in some vectorized way, but Im still a NOOB, so its all I could think now!如果以某种矢量化的方式这会很好,但我仍然是一个菜鸟,所以我现在能想到的就这些!
import numpy as np,numba as nb
a=np.array([[1, 2, 3, 4],
[2, 3, 1],
[5, 5, 5, 5,5],
[1, 1]])
@nb.jit()
def f(a):
l=len(max(a,key=len))
a0=np.empty(a.shape+(l,))
for n,i in enumerate(a.flat):
a0[n]=np.pad(i,(0,l-len(i)),mode='constant')
a=a0
return a
print(f(a))
data = np.array([[1, 2, 3, 4],
[2, 3, 1],
[5, 5, 5, 5],
[1, 1]])
max_len=max([len(i) for i in data])
np.array([ np.pad(data[i],
(0,max_len-len(data[i])),
'constant',
constant_values=0) for i in range(len(data))])
The lengths of the individual arrays are computed, then the maximum among these lengths is stored in a variable.计算单个数组的长度,然后将这些长度中的最大值存储在变量中。 After which all the individual rows of the matrix is padded with 0s on the right to match the maximum length.之后,矩阵的所有单独行都在右侧填充 0 以匹配最大长度。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.