简体   繁体   English

numpy:用numpy数组替换numpy数组中的零

[英]Numpy: replacing zeros in numpy array with a numpy array

Working with data that I want to pivot. 处理我要透视的数据。 Note that I am limited to only numpy and am unable to use pandas. 请注意,我仅限于numpy,并且无法使用熊猫。 The original data looks like this: 原始数据如下所示:

data = [
  [ 1, a, [<metric1>, <metric2>] ],
  [ 1, b, [<metric1>, <metric2>] ],
  [ 2, b, [<metric1>, <metric2>] ],
  [ 2, c, [<metric1>, <metric2>] ],
  [ 3, a, [<metric1>, <metric2>] ],
  [ 3, c, [<metric1>, <metric2>] ],
  ...etc
]

Pivoting my data with numpy: 使用numpy旋转数据:

rows, row_pos = np.unique(data[:, row_index], return_inverse=True)
cols, col_pos = np.unique(data[:, col_index], return_inverse=True)
pivot_table = np.zeros((len(rows), len(cols)), dtype=object)
pivot_table[row_pos, col_pos] = data[:, pivot_index]

The resulting format is: 结果格式为:

cols = [a, b, c, ...]
rows = [1, 2, 3, ...]
pivot_table = [
  [ [<metric1>, <metric2>], [<metric1>, <metric2>], 0, ... ],
  [ 0, [<metric1>, <metric2>], [<metric1>, <metric2>], ... ],
  [ [<metric1>, <metric2>], 0, [<metric1>, <metric2>], ... ],
  ...
]

The pivoted table is eventually rendered, where it notes where the zeros are, and will create the correct number of cells so that the table is correctly formatted. 最终将对透视表进行渲染,在其中指出零在哪里,并将创建正确数量的单元格,以便对表进行正确格式化。

This is just a temporary workaround, as originally I tried just replacing the zeros with a numpy array (ie, [0,0]) 这只是暂时的解决方法,因为最初我只是尝试用numpy数组(即[0,0])替换零

pivot_table[pivot_table == 0] = [0,0]

But I got the following error: 但我收到以下错误:

TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions

My temporary fix was sufficient, but am limited when I want to do something such as having a row of column sums. 我的临时解决方案就足够了,但是当我想做某事(例如有一列列总和)时就受到了限制。 I have a number of approaches but don't know how to execute them: 我有很多方法,但是不知道如何执行它们:

  1. As mentioned above, replacing zeros after the fact with a list of zeros 如上所述,将事实后的零替换为零列表
  2. When initially creating the table using the indexes from np.unique , have a default value to fill the table, instead of zeros. 最初使用np.unique的索引创建表时,请使用默认值填充表,而不是零。
  3. Pull out the metrics from the list into the array ie, [ 1, a, <metric1>, <metric2> ] . 从列表中将指标拉出到数组中,即[ 1, a, <metric1>, <metric2> ] This is likely the best solution to simplify aggregate functions. 这可能是简化聚合函数的最佳解决方案。

Any solutions for either of the approaches mentioned? 对于上述两种方法有什么解决方案吗?

Here is how to get your approach 2 to work: 这是使方法2起作用的方法:

fillvalue = np.empty((), 'O')
fillvalue[()] = [0, 0]
pivot_table = np.full((len(rows), len(cols)), fillvalue)

etc.

Note that the [0, 0] s are all the same object, so if you want to change one of them you shouldn't do it by modifying the list object in place, but rather create a new list and assign it to the array position. 请注意, [0, 0]都是相同的对象,因此,如果要更改其中一个,则不应通过就地修改列表对象来执行此操作,而应创建一个新列表并将其分配给数组位置。

If you want a 3D numerical array instead of an array of lists, the quick fix is np.array(pivot_table.tolist()) . 如果要3D数字数组而不是列表数组,则快速解决方法是np.array(pivot_table.tolist())

Trying to recreate your case: 尝试重新创建案例:

In [182]: a,b,c = 0,1,2
In [183]: metric1, metric2 = 100,200
In [186]: data = [
     ...:   [ 1, a, [metric1, metric2] ],
     ...:   [ 1, b, [metric1, metric2] ],
     ...:   [ 2, b, [metric1, metric2] ],
     ...:   [ 2, c, [metric1, metric2] ],
     ...:   [ 3, a, [metric1, metric2] ],
     ...:   [ 3, c, [metric1, metric2] ],
     ...: ]
In [187]: 
In [187]: data
Out[187]: 
[[1, 0, [100, 200]],
 [1, 1, [100, 200]],
 [2, 1, [100, 200]],
 [2, 2, [100, 200]],
 [3, 0, [100, 200]],
 [3, 2, [100, 200]]]

In [189]: data = np.array(data,object)
In [190]: rows, row_pos = np.unique(data[:, 0], return_inverse=True)
     ...: cols, col_pos = np.unique(data[:, 1], return_inverse=True)
     ...: pivot_table = np.zeros((len(rows), len(cols)), dtype=object)

In [191]: pivot_table
Out[191]: 
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]], dtype=object)
In [192]: pivot_table[row_pos, col_pos] = data[:, 2]
In [193]: pivot_table
Out[193]: 
array([[list([100, 200]), list([100, 200]), 0],
       [0, list([100, 200]), list([100, 200])],
       [list([100, 200]), 0, list([100, 200])]], dtype=object)
In [194]: pivot_table[row_pos, col_pos]
Out[194]: 
array([list([100, 200]), list([100, 200]), list([100, 200]),
       list([100, 200]), list([100, 200]), list([100, 200])], dtype=object)
In [195]: _.shape
Out[195]: (6,)
In [196]: data[:,2].shape
Out[196]: (6,)

This assignment works between the source shape (and dtype) matches the target's (6,). 此分配在源形状(和dtype)与目标的形状(6,)匹配之间起作用。

In [197]: mask = pivot_table==0
In [198]: mask
Out[198]: 
array([[False, False,  True],
       [ True, False, False],
       [False,  True, False]])
In [199]: pivot_table[mask]
Out[199]: array([0, 0, 0], dtype=object)
In [200]: pivot_table[mask] = [0,0]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-200-83e0a7422802> in <module>()
----> 1 pivot_table[mask] = [0,0]

ValueError: NumPy boolean array indexing assignment cannot assign 2 input values to the 3 output values where the mask is true

Different error message (different numpy version?), but this says I'm trying to put 2 values into 3 slots. 不同的错误消息(不同的numpy版本?),但这表示我试图将2个值放入3个插槽中。 It doesn't treat the [0,0] as a single item, but as 2. 它不会将[0,0]视为单个项目,而是视为2。

No problem assigning a scalar element: 分配标量元素没有问题:

In [203]: pivot_table[mask] = None
In [204]: pivot_table
Out[204]: 
array([[list([100, 200]), list([100, 200]), None],
       [None, list([100, 200]), list([100, 200])],
       [list([100, 200]), None, list([100, 200])]], dtype=object)

In the past I've had success using frompyfunc to create object dtype arrays. 过去,我成功使用frompyfunc创建对象frompyfunc数组。 Define a little function. 定义一个小功能。 I could have tested for 0 or type, but since I've already inserted None, let's test for that: 我可以测试0或类型,但是由于我已经插入了None,所以我们来测试一下:

In [205]: def fun(x):
     ...:     if x is None: return [0,0]
     ...:     return x

Apply it to each element of pivot_table , producing a new array. 将其应用于pivot_table每个元素,生成一个新数组。

In [230]: arr1 = np.frompyfunc(fun,1,1)(pivot_table)
In [231]: arr1
Out[231]: 
array([[list([100, 200]), list([100, 200]), list([0, 0])],
       [list([0, 0]), list([100, 200]), list([100, 200])],
       [list([100, 200]), list([0, 0]), list([100, 200])]], dtype=object)

Another approach, let's try to assign a list of lists: 另一种方法,让我们尝试分配一个列表列表:

In [240]: pivot_table[mask] = [[0,0] for _ in range(3)]    
TypeError: NumPy boolean array indexing assignment requires a 0 or 1-dimensional input, input has 2 dimensions

But if I try the same thing with where , it works: 但是,如果我在where尝试相同的操作,它将起作用:

In [241]: pivot_table[np.where(mask)] = [[0,0] for _ in range(3)]
In [242]: pivot_table
Out[242]: 
array([[list([100, 200]), list([100, 200]), list([0, 0])],
       [list([0, 0]), list([100, 200]), list([100, 200])],
       [list([100, 200]), list([0, 0]), list([100, 200])]], dtype=object)

With where it's more like your original assignment to pivot_table . where更像是您对pivot_table的原始分配。

In [243]: np.where(mask)
Out[243]: (array([0, 1, 2]), array([2, 0, 1]))

This array indexing still can have problems with broadcasting, 这种数组索引仍然会在广播方面出现问题,

In [244]: pivot_table[np.where(mask)] = [0,0]
ValueError: cannot copy sequence with size 2 to array axis with dimension 3

Usually boolean mask index behaves like the equivalent np.where(mask) indexing, but evidently here, the interplay of object dtype, and broadcasting messes with the boolean indexing. 通常,布尔值掩码索引的行为类似于等效的np.where(mask)索引,但是显然,这里是对象np.where(mask)的相互作用,并且布尔值索引会造成混乱。


Out[231] is still a (3,3) array, even though all elements a len 2 lists. 即使len 2列出了所有元素, Out[231]仍然是(3,3)数组。 To turn it into a numeric array we have to do something like: 要将其转换为数值数组,我们必须执行以下操作:

In [248]: p = np.stack(pivot_table.ravel()).reshape(3,3,2)
In [249]: p
Out[249]: 
array([[[100, 200],
        [100, 200],
        [  0,   0]],

       [[  0,   0],
        [100, 200],
        [100, 200]],

       [[100, 200],
        [  0,   0],
        [100, 200]]])

np.concatenate (and *stack versions) can join lists into an array, but it has to start with a list or flat array, hence the need for ravel and reshape. np.concatenate (和*stack版本)可以将列表连接到数组中,但是必须从列表或平面数组开始,因此需要进行修饰和整形。

np.array(pivot_table.tolist()) also works. np.array(pivot_table.tolist())也可以。


If instead you'd constructed a structured data array (assuming the metric values are numeric): 相反,如果您构造了一个结构化的数据数组(假设metric值是数字):

In [265]: data1 = np.array([tuple(x.tolist()) for x in data],'i,i,2i')
In [266]: data1
Out[266]: 
array([(1, 0, [100, 200]), (1, 1, [100, 200]), (2, 1, [100, 200]),
       (2, 2, [100, 200]), (3, 0, [100, 200]), (3, 2, [100, 200])],
      dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4', (2,))])
In [267]: data1['f2']
Out[267]: 
array([[100, 200],
       [100, 200],
       [100, 200],
       [100, 200],
       [100, 200],
       [100, 200]], dtype=int32)

these values could be assigned to a 3d pivot_table: 这些值可以分配给3divot_table:

In [268]: p = np.zeros((len(rows), len(cols),2),int)
In [269]: p[row_pos, col_pos]=data1['f2']

With the fillvalue array that Paul Panzer defined, your initial masked assignment works: 使用Paul Panzer定义的fillvalue数组,您的初始蒙版分配有效:

In [322]: fillvalue = np.empty((), 'O')
     ...: fillvalue[()] = [0, 0]
     ...: 
In [323]: fillvalue
Out[323]: array(list([0, 0]), dtype=object)
In [324]: mask
Out[324]: 
array([[False, False,  True],
       [ True, False, False],
       [False,  True, False]])
In [325]: pivot_table[mask] = fillvalue

His full does a np.copyto(a, fill_value, casting='unsafe') , Our masked assignment could be written as: np.copyto(pivot_table, fillvalue, where=mask) 他的full工作都是np.copyto(a, fill_value, casting='unsafe') ,我们的蒙版分配可以写为: np.copyto(pivot_table, fillvalue, where=mask)

Your entry data types are not clear, an can create inconvenience. 您的输入数据类型不清楚,可能会带来不便。 Avoiding object type facilitate data structure analysis. 避免对象类型有助于数据结构分析。 Using structured array can help: 使用结构化数组可以帮助:

Sample raw data: 原始数据样本:

n=10
data= [ [randint(5),'abcdef'[randint(6)],rand(2)] for _ in range(n)]

Manually typing and filling: 手动输入和填写:

dt=np.dtype([('i', 'i4'), ('j', 'U1'), ('val', 'f8', 2)])
arr = ndarray(len(data),dtype=dt)
for k,(a,b,c) in enumerate (data):
    arr[k]['i']=a
    arr[k]['j']=b
    arr[k]['val']=c

Now all is easy: 现在一切都很容易:

row=arr['i']
col=arr['j']
val=arr['val']

(r,ri),(c,ci) = (np.unique(x,return_inverse=True) for x in (row,col))
res=zeros((len(r),len(c),2)) # the good shape
res[ri,ci]=val

res is now res现在是

[[[ 0.87  0.96]
  [ 0.03  0.92]
  [ 0.45  0.55]
  [ 0.    0.  ]
  [ 0.    0.  ]]

 [[ 0.27  0.84]
  [ 0.    0.  ]
  [ 0.41  0.05]
  [ 0.47  0.67]
  [ 0.    0.  ]]

 [[ 0.3   0.05]
  [ 0.    0.  ]
  [ 0.    0.  ]
  [ 0.    0.  ]
  [ 0.37  0.76]]

 [[ 0.    0.  ]
  [ 0.    0.  ]
  [ 0.    0.  ]
  [ 0.    0.  ]
  [ 0.4   0.07]]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM