[英]Sum numpy array values based on labels in a separate array
我有類似於以下的數組:
a=[["tennis","tennis","golf","federer","cricket"],
["federer","nadal","woods","sausage","federer"],
["sausage","lion","prawn","prawn","sausage"]]
然后,我得到以下權重的矩陣
w=[[1,3,3,4,5],
[2,3,2,3,4],
[1,2,1,1,1]]
然后,我要做的是基於矩陣a的每一行的權重對權重求和,並從該行中獲取前3位的權重。 所以最后我想要這樣的事情:
res=[["cricket","tennis","federer"],
["federer","sausage","nadal"],
["lion","sausage","prawn"]]
在我的實際數據集中,聯系不太可能,也不是真正要關注的問題,對於整行是這樣的情況:
["federer","federer","federer","federer","federer"]
理想情況下,我希望將此返回為[“ federer”,“”,“”]。
任何指導將不勝感激。
有關numpy數組,請參見piRSquared答案 。
這是一種純python方法:
for i in range(4):
if a[i].count(a[i][0]) == len(a[i]):
res = [a[1][0], "", ""]
else:
res = [x[0] for x in sorted(zip(a[i], w[i]), key=lambda c: c[1], reverse=True)[:3]]
print(res)
嘗試:
print pd.DataFrame(
{i: a.loc[i, row.sort_values(ascending=False).index[:3]].values for i, row in w.iterrows()}
).T
0 1 2
0 cricket federer golf
1 federer sausage nadal
2 lion sausage prawn
我設法使用下面的代碼使其工作:
def myf(a,w):
lookupTable, indexed_dataSet = np.unique(a, return_inverse=True)
y= np.bincount(indexed_dataSet,w)
lookupTable[y.argsort()]
res=(lookupTable[y.argsort()][::-1][:3])
ret=np.empty((3))
ret.fill(res[-1])
ret[0:res.shape[0]]=res
return ret
result = np.empty_like(knearest_labels[:,0:3])
for i,(x,y) in enumerate(zip(a,w)):
result[i] = myf(x,y)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.