将 groupby 后的函数应用结果复制到 Pandas 列中

Question

I am trying to do a pandas equivalent of the following data.table operations:我正在尝试做一个相当于以下data.table操作的pandas ：

dt <- data.table(id = 1:10, x = rnorm(40))
dt <- dt[order(id)]
dt[, diff_x := c(0,diff(x)), by = id]

head(dt, 12)

# output:
    id           x      diff_x
 1:  1  0.01419519  0.00000000
 2:  1 -0.39539869 -0.40959388
 3:  1 -0.43918689 -0.04378821
 4:  1 -0.79905967 -0.35987278
 5:  2  0.59555572  0.00000000
 6:  2 -0.21933639 -0.81489211
 7:  2 -0.65462968 -0.43529329
 8:  2  0.99307684  1.64770652
 9:  3 -1.31185544  0.00000000
10:  3  1.23649358  2.54834902
11:  3  0.66359594 -0.57289764
12:  3  1.77078647  1.10719053

First of all, I am not sure how to do a diff in an easy way with padding that I did above, so I wrote my own function for that.首先，我不确定如何使用我上面所做的填充以简单的方式进行diff ，因此我为此编写了自己的函数。 But, more importantly, I am not sure how to copy the result of my groupby operation back into my pandas dataframe as a new column (the way I do easily above with data.table ).但是，更重要的是，我不知道怎么我的结果复制groupby操作回到我的pandas数据帧作为新列（我这样做很容易与上面的方式data.table ）。 Here is what I tried so far:这是我到目前为止尝试过的：

def diff_pad(vect):
    return(np.concatenate([[0], np.diff(vect)]))

df = pd.DataFrame()
df['id'] = list((range(1,11))) * 4
df.sort(['id'], inplace=True)
df['x'] = rand(40)

diffz = df.groupby('id')['x'].apply(diff_pad)

df['diffz'] = diffz
print(df.head(10))

#out:
    id         x                                              diffz
0    1  0.757153                                                NaN
30   1  0.869001                                                NaN
10   1  0.140684  [0.0, 0.362003972215, -0.742119725957, -0.0684...
20   1  0.791483                                                NaN
21   2  0.941333                                                NaN
1    2  0.504867  [0.0, 0.111848720078, -0.728317633944, 0.65079...
31   2  0.273321                                                NaN
11   2  0.118802                                                NaN
2    3  0.848048  [0.0, -0.436465430463, -0.231545666932, -0.154...
12   3  0.357192                                                NaN

Edit:编辑：

In R/data.table, I can apply an arbitrary function that takes any columns of the table grouped by another set of columns and assigns a result to a new column.在 R/data.table 中，我可以应用任意函数，该函数采用by另一组列分组的表中的任何列，并将结果分配给新列。

Eg:例如：

library(data.table)

dt <- data.table(id = 1:10, x = rnorm(40), y = rnorm(40))
dt <- dt[order(id)]

my_funct <- function(x, y) {
  return(sqrt(max(x)^2 + min(y)^2))
}

dt[, z := my_funct(x, y), by = id]

head(dt, 12)


# out:

    id           x          y         z
 1:  1  0.26012913  0.7612974 1.2433969
 2:  1  1.19113080  1.4228528 1.2433969
 3:  1 -0.07970657 -0.3567118 1.2433969
 4:  1 -0.33129374  0.7879845 1.2433969
 5:  2  0.60868698  0.9716669 0.8872687
 6:  2 -0.72751776  0.0392282 0.8872687
 7:  2 -0.17724141  0.2599093 0.8872687
 8:  2  0.13324134 -0.6455587 0.8872687
 9:  3 -1.91015664 -1.1340993 2.2408919
10:  3 -0.95696559 -0.2624625 2.2408919
11:  3  1.93272221  0.2788335 2.2408919
12:  3  0.46391776 -0.9080321 2.2408919

How would I do something like that in pandas?我将如何在熊猫中做这样的事情？

Answer 1

1st off, welcome to pandas!第一关，欢迎来到熊猫！

Second, I'd start off defining df like this.其次，我会像这样定义df 。 This is a style preference of mine and by no means canonical.这是我的风格偏好，绝不是规范的。

import numpy as np
import pandas as pd

df = pd.DataFrame(dict(
        id=np.repeat(np.arange(1, 11), 4),
        x=np.random.randn(40)
    ))

Lastly, if I understood you correctly:最后，如果我理解正确的话：

df['x_diff'] = df.groupby('id').x.diff().fillna(0)
df

you could have used apply with your own function like this:您可以将apply与您自己的函数一起使用，如下所示：

def my_diff(x):
    return x.diff().fillna(0)

df.groupby('id').apply(my_diff)

The reason yours didn't work was because you returned a numpy array with no index values to line up with the pandas series your function was being applied to.你的不起作用的原因是因为你返回了一个没有索引值的 numpy 数组来与你的函数所应用的熊猫系列对齐。 You see in your results that the answer is there, but it's crammed into a single cell.您在结果中看到答案就在那里，但它被塞进了一个单元格中。

将 groupby 后的函数应用结果复制到 Pandas 列中

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-12-04 07:48:50

将 groupby 后的函数应用结果复制到 Pandas 列中

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-12-04 07:48:50

解决方案1
2 已采纳 2016-12-04 07:48:50