简体   繁体   English

Torch/numpy:groupby 熊猫替代品

[英]Torch/numpy: groupby pandas alternative

Is there some efficient way to rewrite the following code to avoid installing & importing pandas and using torch / numpy instead?是否有一些有效的方法可以重写以下代码以避免安装和导入pandas并使用torch / numpy代替? I am used to work with pandas , so I wrote it like this, but I am trying to learn numpy and torch , so I am looking for alternative solutions that do not use pandas .我习惯使用pandas ,所以我是这样写的,但我正在尝试学习numpytorch ,所以我正在寻找不使用pandas替代解决方案。

bins = torch.LongTensor(3072).random_(0, 35)
weights = torch.rand((3072))
df = pd.DataFrame({'weights': weights.numpy(), 'bins': bins.numpy()})
bins_sum = df.groupby('bins').weights.sum().values

So, basically: how, without using pandas , get a sum of weights grouped by bins ?所以,基本上:如何在不使用pandas情况下获得按bins分组的weights总和?

You can compute unique elements of bins via torch.unique (the values to group by) and then use index masks for accessing the corresponding elements in weights :您可以通过torch.unique (要分组的值)计算bins唯一元素,然后使用索引掩码访问weights的相应元素:

unique = torch.unique(bins)
result = torch.zeros(unique.size(), dtype=weights.dtype)
for i, val in enumerate(unique):
    result[i] += weights[bins == val].sum()

numpy has isin which is similar to pandas.isin numpy具有isin其类似于pandas.isin

Pandas groupby , select the data ( row ) and apply function on the group . Pandas groupby ,选择数据( row )并在group上应用函数。

def groupby(data, bin_data, grouper, agg):
  '''
    data:     numpy    array
    bin_data: bin's    data
    grouper:  callable, which give returns a list of array values.
    agg:      callable, to be applied on group
  '''

  res = {}
  for key,arr in grouper(data, bin_data):
    res.update({key, agg(arr)})

  return res

# Find the indices where `bins == b` and then use them to select the `arry` values
bin_grouper = lambda arry, bvalue: [(b, arry[np.isin(bvalue, b)]) for b in bvalue]

# Compute result
gdata = groupby(weights.numpy(), bins.numpy(), bin_grouper, np.sum)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM