[英]Numpy array normalization by group ids:
Suppose data and labels be numpy arrays as below:假设数据和标签是numpy 数组,如下所示:
import numpy as np
data=np.array([[0,4,5,6,8],[0,6,8,9],[1,9,5],[1,45,7],[1,8,3]]) #Note: length of each row is different
labels=np.array([4,6,10,4,6])
The first element in each row in data shows an id of a group . data中每一行的第一个元素显示一个组的 id 。 I want to normalize (see below example) the labels based on the group ids :
我想根据组 id规范化(见下面的例子)标签:
For example the first two rows in data have id=0;例如,数据中的前两行有 id=0; thus, their label must be:
因此,它们的标签必须是:
normalized_labels[0]=labels[0]/(4+6)=0.4
normalized_labels[1]=labels[1]/(4+6)=0.6
The expected output should be:预期的输出应该是:
normalized_labels=[0.4,0.6,0.5,0.2,0.3]
I have a naive solution as:我有一个天真的解决方案:
ids=[data[i][0] for i in range(data.shape[0])]
out=[]
for i in set(ids):
ind=np.where(ids==i)
out.extend(list(labels[ind]/np.sum(labels[ind])))
out=np.array(out)
print(out)
Is there any numpy functions to perform such a task.是否有任何 numpy 函数来执行这样的任务。 Any suggestion is appreciated!!
任何建议表示赞赏!
I found this kind of subtle way to transform labels
into sums of groups with respect to indices = [n[0] for n in data]
.我发现了这种将
labels
转换为关于indices = [n[0] for n in data]
的组总和的微妙方法。 In later solution, no use of data
is needed:在后面的解决方案中,不需要使用
data
:
indices = [n[0] for n in data]
u, inv = np.unique(indices, return_inverse=True)
bincnt = np.bincount(inv, weights=labels)
sums = bincnt[inv]
Now sums are: array([10., 10., 20., 20., 20.])
.现在总和是:
array([10., 10., 20., 20., 20.])
。 The further is simple like so:进一步很简单,如下所示:
normalized_labels = labels / sums
Remarks.评论。
np.bincount
calculates weighted sums of items labeled as 0, 1, 2... This is why reindexation indices -> inv
is needed. np.bincount
计算标记为 0、1、2 的项目的加权总和...这就是为什么需要重新indices -> inv
。 For example, indices = [8, 6, 4, 3, 4, 6, 8, 8]
should be mapped into inv = [3, 2, 1, 0, 1, 2, 3, 3]
.例如,
indices = [8, 6, 4, 3, 4, 6, 8, 8]
应该映射到inv = [3, 2, 1, 0, 1, 2, 3, 3]
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.