将数据框分组并按组计算归一化标准偏差？

Question

I've got a dataframe that looks like this:我有一个如下所示的数据框：

           product  cost_per_quantity
12779  0101010G0BB         160.788357
12653  0101010G0BC         179.493956
10390  0101010I0AA           0.425916
20361  0101010I0AA           0.603650
22504  0101010I0AA           0.633082

created with:创建于：

df = pd.DataFrame({ 'product': ['0101010G0BB', '0101010G0BC', '0101010I0AA', '0101010I0AA', '0101010I0AA'], 'cost_per_quantity': [160.788357, 179.493956, 0.425916, 0.603650, 0.633082]})

Now I want to find the products with the maximum variation in cost_per_quantity .现在我想找到cost_per_quantity变化最大的产品。

So for example, I'd like to examine the product 0101010I0AA and find the normalised standard deviation for cost_per_quantity across its three entries, and then compare it with normalised standard deviation for other products.因此，例如，我想检查产品0101010I0AA并在其三个条目中找到cost_per_quantity的标准化标准偏差，然后将其与其他产品的标准化标准偏差进行比较。

What's the best way to approach this?解决这个问题的最佳方法是什么？ I tried:我试过：

df1 = df.groupby('product').agg(np.std)

but that just gives me a bunch of NaN s.但这只是给了我一堆NaN 。

Answer 1

For aggregation df.groupby('product').agg(np.std) is correct but for 1-observation groups this returns NaN as the sample standard deviation cannot be calculated for 1-observation groups.对于聚合df.groupby('product').agg(np.std)是正确的，但对于 1-观察组，这将返回NaN因为无法计算 1-观察组的样本标准偏差。 Numpy default for standard deviation is population standard deviation but I guess Pandas is overriding that.标准差的 Numpy 默认值是总体标准差，但我猜 Pandas 会覆盖它。

You can go with the population standard deviation to get 0 for those groups.您可以使用总体标准差来为这些组获得 0。

If you want to see the relative deviation with respect to the mean, you can use coefficient of variation :如果要查看相对于平均值的相对偏差，可以使用变异系数：

df.groupby('product').apply(lambda x: np.std(x) / np.mean(x))

Now that np.std is in a lambda function, it behaves as expected.现在np.std在 lambda 函数中，它的行为符合预期。

将数据框分组并按组计算归一化标准偏差？

问题描述

1 个解决方案

解决方案1
5 已采纳 2016-04-04 10:52:25

将数据框分组并按组计算归一化标准偏差？

问题描述

1 个解决方案

解决方案1 5 已采纳 2016-04-04 10:52:25

解决方案1
5 已采纳 2016-04-04 10:52:25