简体   繁体   English

numpy中的组数组元素的产品(Python)

[英]Product of array elements by group in numpy (Python)

I'm trying to build a function that returns the products of subsets of array elements. 我正在尝试构建一个返回数组元素子集的产品的函数。 Basically I want to build a prod_by_group function that does this: 基本上我想构建一个prod_by_group函数来执行此操作:

values = np.array([1, 2, 3, 4, 5, 6])
groups = np.array([1, 1, 1, 2, 3, 3])

Vprods = prod_by_group(values, groups)

And the resulting Vprods should be: 由此产生的Vprods应该是:

Vprods
array([6, 4, 30])

There's a great answer here for sums of elements that I think it should be similar to: https://stackoverflow.com/a/4387453/1085691 这里有一个很好的答案,我认为它应该类似于以下元素: https//stackoverflow.com/a/4387453/1085691

I tried taking the log first, then sum_by_group , then exp , but ran into numerical issues. 我首先尝试获取log ,然后是sum_by_group ,然后是exp ,但遇到了数值问题。

There are some other similar answers here for min and max of elements by group: https://stackoverflow.com/a/8623168/1085691 这里有一些其他类似的答案,分组的最小和最大元素: https//stackoverflow.com/a/8623168/1085691

Edit: Thanks for the quick answers! 编辑:感谢您的快速解答! I'm trying them out. 我正在尝试它们。 I should add that I want it to be as fast as possible (that's the reason I'm trying to get it in numpy in some vectorized way, like the examples I gave). 我应该补充一点,我希望它尽可能快(这就是我试图以某种矢量化的方式将其变为numpy的原因,就像我给出的例子一样)。

Edit: I evaluated all the answers given so far, and the best one is given by @seberg below. 编辑:我评估了到目前为止给出的所有答案,最好的答案由@seberg给出。 Here's the full function that I ended up using: 这是我最终使用的全部功能:

def prod_by_group(values, groups):
    order = np.argsort(groups)
    groups = groups[order]
    values = values[order]
    group_changes = np.concatenate(([0], np.where(groups[:-1] != groups[1:])[0] + 1))
    return np.multiply.reduceat(values, group_changes)

If you groups are already sorted (if they are not you can do that with np.argsort ), you can do this using the reduceat functionality to ufunc s (if they are not sorted, you would have to sort them first to do it efficiently): 如果组已经排序(如果他们不是你可以做到这一点np.argsort ),您可以用做reduceat功能ufunc秒(如果它们没有排序,你就必须给他们第一排序,以有效地做到这一点):

# you could do the group_changes somewhat faster if you care a lot
group_changes = np.concatenate(([0], np.where(groups[:-1] != groups[1:])[0] + 1))
Vprods = np.multiply.reduceat(values, group_changes)

Or mgilson answer if you have few groups. 或者mgilson回答如果你有几个团体。 But if you have many groups, then this is much more efficient. 但是如果你有很多小组,那么效率会更高。 Since you avoid boolean indices for every element in the original array for every group. 因为您为每个组避免了原始数组中每个元素的布尔索引。 Plus you avoid slicing in a python loop with reduceat. 另外,你可以避免使用reduceat在python循环中切片。

Of course pandas does these operations conveniently. 当然,熊猫可以方便地进行这些操作。

Edit: Sorry had prod in there. 编辑:抱歉,那里有prod The ufunc is multiply . ufunc是multiply You can use this method for any binary ufunc . 您可以将此方法用于任何二进制ufunc This means it works for basically all numpy functions that can work element wise on two input arrays. 这意味着它适用于基本上所有numpy函数,它们可以在两个输入数组上以元素方式工作。 (ie. multiply normally multiplies two arrays elementwise, add adds them, maximum/minimum, etc. etc.) (即,乘法通常将两个数组元素相乘,添加它们,最大/最小等等)

First set up a mask for the groups such that you expand the groups in another dimension 首先为组设置掩码,以便在另一个维度中展开组

mask=(groups==unique(groups).reshape(-1,1))
mask
array([[ True,  True,  True, False, False, False],
       [False, False, False,  True, False, False],
       [False, False, False, False,  True,  True]], dtype=bool)

now we multiply with val 现在我们乘以val

mask*val
array([[1, 2, 3, 0, 0, 0],
       [0, 0, 0, 4, 0, 0],
       [0, 0, 0, 0, 5, 6]])

now you can already do prod along the axis 1 except for those zeros, which is easy to fix: 现在你已经可以沿轴1做刺激,除了那些容易修复的零点:

prod(where(mask*val,mask*val,1),axis=1)
array([ 6,  4, 30])

As suggested in the comments, you can also use the Pandas module . 正如评论中所建议的那样,您也可以使用Pandas模块 Using the grouby() function, this task becomes an one-liner: 使用grouby()函数,此任务变为单行:

import numpy as np
import pandas as pd

values = np.array([1, 2, 3, 4, 5, 6])
groups = np.array([1, 1, 1, 2, 3, 3])

df = pd.DataFrame({'values': values, 'groups': groups})

So df then looks as follows: 所以df然后看起来如下:

   groups  values
0       1       1
1       1       2
2       1       3
3       2       4
4       3       5
5       3       6

Now you can groupby() the groups column and apply numpy's prod() function to each of the groups like this 现在,您可以groupby()groups列和apply numpy的的prod()函数,每个组这样的

 df.groupby(groups)['values'].apply(np.prod)

which gives you the desired output: 它为您提供所需的输出:

1     6
2     4
3    30

好吧,我怀疑这是一个很好的答案,但这是我能想到的最好的答案:

np.array([np.product(values[np.flatnonzero(groups == x)]) for x in np.unique(groups)])

It's not a numpy solution, but it's fairly readable (I find that sometimes numpy solutions aren't!): 它不是一个笨拙的解决方案,但它具有相当的可读性(我发现有时候numpy解决方案不是!):

from operator import itemgetter, mul
from itertools import groupby

grouped = groupby(zip(groups, values), itemgetter(0))
groups = [reduce(mul, map(itemgetter(1), vals), 1) for key, vals in grouped]
print groups
# [6, 4, 30]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM