简体   繁体   English

将 groupby 后的函数应用结果复制到 Pandas 列中

[英]Copying results of a function apply after groupby into a pandas column

I am trying to do a pandas equivalent of the following data.table operations:我正在尝试做一个相当于以下data.table操作的pandas

dt <- data.table(id = 1:10, x = rnorm(40))
dt <- dt[order(id)]
dt[, diff_x := c(0,diff(x)), by = id]

head(dt, 12)

# output:
    id           x      diff_x
 1:  1  0.01419519  0.00000000
 2:  1 -0.39539869 -0.40959388
 3:  1 -0.43918689 -0.04378821
 4:  1 -0.79905967 -0.35987278
 5:  2  0.59555572  0.00000000
 6:  2 -0.21933639 -0.81489211
 7:  2 -0.65462968 -0.43529329
 8:  2  0.99307684  1.64770652
 9:  3 -1.31185544  0.00000000
10:  3  1.23649358  2.54834902
11:  3  0.66359594 -0.57289764
12:  3  1.77078647  1.10719053

First of all, I am not sure how to do a diff in an easy way with padding that I did above, so I wrote my own function for that.首先,我不确定如何使用我上面所做的填充以简单的方式进行diff ,因此我为此编写了自己的函数。 But, more importantly, I am not sure how to copy the result of my groupby operation back into my pandas dataframe as a new column (the way I do easily above with data.table ).但是,更重要的是,我不知道怎么我的结果复制groupby操作回到我的pandas数据帧作为新列(我这样做很容易与上面的方式data.table )。 Here is what I tried so far:这是我到目前为止尝试过的:

def diff_pad(vect):
    return(np.concatenate([[0], np.diff(vect)]))

df = pd.DataFrame()
df['id'] = list((range(1,11))) * 4
df.sort(['id'], inplace=True)
df['x'] = rand(40)

diffz = df.groupby('id')['x'].apply(diff_pad)

df['diffz'] = diffz
print(df.head(10))

#out:
    id         x                                              diffz
0    1  0.757153                                                NaN
30   1  0.869001                                                NaN
10   1  0.140684  [0.0, 0.362003972215, -0.742119725957, -0.0684...
20   1  0.791483                                                NaN
21   2  0.941333                                                NaN
1    2  0.504867  [0.0, 0.111848720078, -0.728317633944, 0.65079...
31   2  0.273321                                                NaN
11   2  0.118802                                                NaN
2    3  0.848048  [0.0, -0.436465430463, -0.231545666932, -0.154...
12   3  0.357192                                                NaN

Edit:编辑:

In R/data.table, I can apply an arbitrary function that takes any columns of the table grouped by another set of columns and assigns a result to a new column.在 R/data.table 中,我可以应用任意函数,该函数采用by另一组列分组的表中的任何列,并将结果分配给新列。

Eg:例如:

library(data.table)

dt <- data.table(id = 1:10, x = rnorm(40), y = rnorm(40))
dt <- dt[order(id)]

my_funct <- function(x, y) {
  return(sqrt(max(x)^2 + min(y)^2))
}

dt[, z := my_funct(x, y), by = id]

head(dt, 12)


# out:

    id           x          y         z
 1:  1  0.26012913  0.7612974 1.2433969
 2:  1  1.19113080  1.4228528 1.2433969
 3:  1 -0.07970657 -0.3567118 1.2433969
 4:  1 -0.33129374  0.7879845 1.2433969
 5:  2  0.60868698  0.9716669 0.8872687
 6:  2 -0.72751776  0.0392282 0.8872687
 7:  2 -0.17724141  0.2599093 0.8872687
 8:  2  0.13324134 -0.6455587 0.8872687
 9:  3 -1.91015664 -1.1340993 2.2408919
10:  3 -0.95696559 -0.2624625 2.2408919
11:  3  1.93272221  0.2788335 2.2408919
12:  3  0.46391776 -0.9080321 2.2408919

How would I do something like that in pandas?我将如何在熊猫中做这样的事情?

1st off, welcome to pandas!第一关,欢迎来到熊猫!

Second, I'd start off defining df like this.其次,我会像这样定义df This is a style preference of mine and by no means canonical.这是我的风格偏好,绝不是规范的。

import numpy as np
import pandas as pd

df = pd.DataFrame(dict(
        id=np.repeat(np.arange(1, 11), 4),
        x=np.random.randn(40)
    ))

Lastly, if I understood you correctly:最后,如果我理解正确的话:

df['x_diff'] = df.groupby('id').x.diff().fillna(0)
df

在此处输入图片说明


you could have used apply with your own function like this:您可以将apply与您自己的函数一起使用,如下所示:

def my_diff(x):
    return x.diff().fillna(0)

df.groupby('id').apply(my_diff)

The reason yours didn't work was because you returned a numpy array with no index values to line up with the pandas series your function was being applied to.你的不起作用的原因是因为你返回了一个没有索引值的 numpy 数组来与你的函数所应用的熊猫系列对齐。 You see in your results that the answer is there, but it's crammed into a single cell.您在结果中看到答案就在那里,但它被塞进了一个单元格中。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python pandas 在 groupby 中应用 function,并将结果添加为数据框中的列 - python pandas apply function in groupby, and add results as column in data frame 根据熊猫中另一列的值在groupby之后应用lambda函数 - apply lambda function after groupby based on values of another column in pandas Pandas groupby 忽略使用 apply 函数创建的列 - Pandas groupby ignores column created with apply function pandas - 按列分组,应用 function 创建新列 - 给出不正确的结果 - pandas - groupby a column, apply a function to create a new column - giving incorrect results Pandas Groupby apply function 非常慢,循环每组&gt;应用功能&gt;将结果添加为新列 - Pandas Groupby apply function is very slow , Looping every group > applying function>adding results as new column 如何在pandas groupby对象上应用函数并将结果保存回父数据帧的新列? - How to apply a function on a pandas groupby object and save the results back into a new column of the parent dataframe? 申请function到pandas groupby - Apply function to pandas groupby 在 pandas 中 groupby 之后的列上应用条件,然后聚合以获得 2 个最大值 - Apply condition on a column after groupby in pandas and then aggregate to get 2 max value groupby.apply(..)后Pandas drop group列 - Pandas drop group column after groupby.apply(..) `groupby` 列未传递给 `apply` 函数。 Pandas 中可能存在的错误? - `groupby` column not passed to `apply` function. Possible bug in Pandas?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM