[英]pandas groupby transform custom function
Is it possible to do a groupby transform with custom functions? 是否可以使用自定义函数进行groupby转换?
data = {
'a':['a1','a2','a3','a4','a5'],
'b':['b1','b1','b2','b2','b1'],
'c':[55,44.2,33.3,-66.5,0],
'd':[10,100,1000,10000,100000],
}
import pandas as pd
df = pd.DataFrame.from_dict(data)
df['e'] = df.groupby(['b'])['c'].transform(sum) #this works as expected
print (df)
# a b c d e
#0 a1 b1 55.0 10 99.2
#1 a2 b1 44.2 100 99.2
#2 a3 b2 33.3 1000 -33.2
#3 a4 b2 -66.5 10000 -33.2
#4 a5 b1 0.0 100000 99.2
def custom_calc(x, y):
return (x * y)
#obviously wrong code here
df['e'] = df.groupby(['b'])['c'].transform(custom_calc(df['c'], df['d']))
As we can see from the above example, what I want is to explore the possibility of being able to pass in a custom function into .transform()
. 正如我们从上面的例子中可以看到的,我想要的是探索能够将自定义函数传递给
.transform()
的可能性。
I am aware that .apply()
exists, but I want to find out if it is possible to use .transform()
exclusively. 我知道
.apply()
存在,但我想知道是否可以独占使用.transform()
。
More importantly, I want to understand how to formulate a proper function that can be passed into .transform()
for it to apply correctly. 更重要的是,我想了解如何制定一个可以传递给
.transform()
的正确函数,以使其正确应用。
PS Currently, I know default functions like 'count'
, sum
, 'sum'
, etc works. PS目前,我知道默认函数,如
'count'
, sum
, 'sum'
等。
One way I like to see what is happening is by creating a small custom function and printing out what is passed and its type. 我喜欢看到正在发生的事情的一种方法是创建一个小的自定义函数并打印出传递的内容及其类型。 Then, you can see you have to work with.
然后,你可以看到你必须与之合作。
def f(x):
print(type(x))
print('\n')
print(x)
print(x.index)
return df.loc[x.index,'d']*x
df['f'] = df.groupby('b')['c'].transform(f)
print(df)
#Output from print statements in function
<class 'pandas.core.series.Series'>
0 55.0
1 44.2
4 0.0
Name: b1, dtype: float64
Int64Index([0, 1, 4], dtype='int64')
<class 'pandas.core.series.Series'>
2 33.3
3 -66.5
Name: b2, dtype: float64
Int64Index([2, 3], dtype='int64')
#End output from print statements in custom function
a b c d e f
0 a1 b1 55.0 10 99.2 550.0
1 a2 b1 44.2 100 99.2 4420.0
2 a3 b2 33.3 1000 -33.2 33300.0
3 a4 b2 -66.5 10000 -33.2 -665000.0
4 a5 b1 0.0 100000 99.2 0.0
Here, I am transforming on column 'c' but I make an "extranal" call to the dataframe object in my custom function to get 'd'. 在这里,我正在改变列'c',但我对自定义函数中的dataframe对象进行了“extranal”调用以得到'd'。
You can also pass the "external" to be used as an argument like this: 你也可以传递“外部”作为参数,如下所示:
def f(x, col):
return df.loc[x.index, col]*x
df['g'] = df.groupby('b')['c'].transform(f, col='d')
print(df)
Output: 输出:
a b c d e f g
0 a1 b1 55.0 10 99.2 550.0 550.0
1 a2 b1 44.2 100 99.2 4420.0 4420.0
2 a3 b2 33.3 1000 -33.2 33300.0 33300.0
3 a4 b2 -66.5 10000 -33.2 -665000.0 -665000.0
4 a5 b1 0.0 100000 99.2 0.0 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.