[英]Torch/numpy: groupby pandas alternative
Is there some efficient way to rewrite the following code to avoid installing & importing pandas
and using torch
/ numpy
instead?是否有一些有效的方法可以重写以下代码以避免安装和导入
pandas
并使用torch
/ numpy
代替? I am used to work with pandas
, so I wrote it like this, but I am trying to learn numpy
and torch
, so I am looking for alternative solutions that do not use pandas
.我习惯使用
pandas
,所以我是这样写的,但我正在尝试学习numpy
和torch
,所以我正在寻找不使用pandas
替代解决方案。
bins = torch.LongTensor(3072).random_(0, 35)
weights = torch.rand((3072))
df = pd.DataFrame({'weights': weights.numpy(), 'bins': bins.numpy()})
bins_sum = df.groupby('bins').weights.sum().values
So, basically: how, without using pandas
, get a sum of weights
grouped by bins
?所以,基本上:如何在不使用
pandas
情况下获得按bins
分组的weights
总和?
You can compute unique elements of bins
via torch.unique
(the values to group by) and then use index masks for accessing the corresponding elements in weights
:您可以通过
torch.unique
(要分组的值)计算bins
唯一元素,然后使用索引掩码访问weights
的相应元素:
unique = torch.unique(bins)
result = torch.zeros(unique.size(), dtype=weights.dtype)
for i, val in enumerate(unique):
result[i] += weights[bins == val].sum()
numpy
has isin
which is similar to pandas.isin
numpy
具有isin
其类似于pandas.isin
Pandas groupby
, select the data ( row
) and apply function on the group
. Pandas
groupby
,选择数据( row
)并在group
上应用函数。
def groupby(data, bin_data, grouper, agg):
'''
data: numpy array
bin_data: bin's data
grouper: callable, which give returns a list of array values.
agg: callable, to be applied on group
'''
res = {}
for key,arr in grouper(data, bin_data):
res.update({key, agg(arr)})
return res
# Find the indices where `bins == b` and then use them to select the `arry` values
bin_grouper = lambda arry, bvalue: [(b, arry[np.isin(bvalue, b)]) for b in bvalue]
# Compute result
gdata = groupby(weights.numpy(), bins.numpy(), bin_grouper, np.sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.