[英]Applying more than one function to a pandas dataframe
我正在寻找一种方法来从我的原始数据中集成多个应用 function。 这是一些简化的代码。
import pandas as pd
df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna" ],
'date':["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-01","2020-01-02","2020-01-01"],
'contribution': [5,5,10,20,30,1,5,5,10,100],
'payment-type': ["cash","transfer","cash","transfer","cash","transfer","cash","transfer","cash","transfer",]})
df['date'] = pd.to_datetime(df['date'])
def myfunction(input):
output = input["name"].value_counts()
output.index.set_names(['name_x'], inplace=True)
return output
daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).apply(myfunction)
print(daily_count.reset_index())
output:
date name_x name
0 2020-01-01 bob 3
1 2020-01-01 charlene 2
2 2020-01-01 alice 2
3 2020-01-01 edna 1
4 2020-01-02 charlene 1
5 2020-01-02 alice 1
我想将此代码中的 output 集成到先前的结果中。
def myfunction(input):
output = input["contribution"].sum()
# output.index.set_names(['name_x'], inplace=True)
return output
daily_count = df.groupby([pd.Grouper(key='date', freq='1D'), "name"]).apply(myfunction)
这会给我类似的东西:
date name num_contrubutions total_pp
0 2020-01-01 bob 3 25
1 2020-01-01 charlene 2 40
2 2020-01-01 alice 2 11
3 2020-01-01 edna 1 100
4 2020-01-02 charlene 1 5
5 2020-01-02 alice 1 10
使用 apply() 对我来说很重要,因为我计划在函数中进行一些 API 调用和数据库查找。
ta,安德鲁
df.groupby(["date","name"])["contribution"].agg(["count","sum"]).reset_index().sort_values(by="count",ascending=False)
#output
date name count sum
1 2020-01-01 bob 3 40
0 2020-01-01 alice 2 25
2 2020-01-01 charlene 2 11
3 2020-01-01 edna 1 100
4 2020-01-02 alice 1 5
5 2020-01-02 charlene 1 10
所以首先,我们按日期和名称分组,然后我们 select 我们要应用聚合/计算的列,首先我们count
每个人的贡献。 然后我们将它们sum
。 之后,为了保持正常dataframe
的形状,我们reset_index
并以descending
方式对值by="count"
sort_values
。
groupby-agg
在这样的用例中非常强大,在这种用例中,将在单个 Groupby 中计算多个单列聚合函数。 语法非常灵活和直接,虽然不是最节省打字的。
限制:聚合函数不能将多于一列作为输入。 如果是这种情况,则必须回.apply()
。
def myfunc(sr):
"""Just a customized function for demo purpose"""
# N.B. cannot write sr.sum() somehow
return np.sum(sr) / (np.std(sr) + 1)
df_out = df.groupby([pd.Grouper(key='date', freq='D'), "name"]).agg({
# column: [func1, func2, ...]
"contribution": [np.size, # accepts 1) a function
"sum", # or 2) a built-in function name
myfunc # or 3) an externally defined function
],
"payment-type": [
lambda sr: len(np.unique(sr)) # or 4) a lambda function
]
})
# postprocess columns and indexes
df_out.columns = ["num_contrubutions", "total_pp", "myfunc", "type_count"]
df_out.reset_index(inplace=True)
# extra demo columns
date name num_contrubutions total_pp myfunc type_count
0 2020-01-01 alice 2 25 2.941176 2
1 2020-01-01 bob 3 40 3.128639 2
2 2020-01-01 charlene 2 11 2.000000 2
3 2020-01-01 edna 1 100 100.000000 1
4 2020-01-02 alice 1 5 5.000000 1
5 2020-01-02 charlene 1 10 10.000000 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.