[英]pandas groupby apply the same function to multiple columns
I have a dataframe that looks like:我有一个看起来像的数据框:
>>> df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2],'month':[1,1,2,2,2,1,2,2],'value1':[1,1,3,3,5,6,7,7], 'value2': [9,10,11,12,12,14,15,15], 'others': range(8)})
>>> df
id month value1 value2 others
0 1 1 1 9 0
1 1 1 1 10 1
2 1 2 3 11 2
3 1 2 3 12 3
4 1 2 5 12 4
5 2 1 6 14 5
6 2 2 7 15 6
7 2 2 7 15 7
I want to do perform a custom function whose input is a series on value1
and value2
:我想执行一个自定义函数,其输入是
value1
和value2
的系列:
def get_most_common(srs):
"""
Returns the most common value in a list. For ties, it returns whatever
value collections.Counter.most_common(1) gives.
"""
from collections import Counter
x = list(srs)
my_counter = Counter(x)
most_common_value = my_counter.most_common(1)[0][0]
return most_common_value
Expected result:预期结果:
value1 value2
id month
1 1 1 9
2 3 12
2 1 6 14
2 7 15
The function was written like that because initially I only had to apply it to a single column ( value1
) so df = df.groupby(['id,'month'])['value1'].apply(get_most_common)
worked.该函数是这样编写的,因为最初我只需要将它应用于单个列(
value1
)所以df = df.groupby(['id,'month'])['value1'].apply(get_most_common)
工作。 Now I have to apply it to two columns simultaneously.现在我必须同时将它应用于两列。
Attempts:尝试:
df = df.groupby(['id,'month'])[['value1','value2']].apply(get_most_common)
gave: df = df.groupby(['id,'month'])[['value1','value2']].apply(get_most_common)
给出:
id month
1 1 value1
2 value1
2 1 value1
2 value1
df = df.groupby(['id,'month'])[['value1','value2']].transform(get_most_common)
gave this给了这个
value1 value2
0 1 9
1 1 9
2 3 12
3 3 12
4 3 12
5 6 14
6 7 15
7 7 15
applymap
doesn't work. applymap
不起作用。 What am I missing here?我在这里缺少什么?
Use GroupBy.agg
- it run function for each column separately:使用
GroupBy.agg
- 它分别为每一列运行函数:
df = df.groupby(['id','month'])['value1','value2'].agg(get_most_common)
print (df)
value1 value2
id month
1 1 1 9
2 3 12
2 1 6 14
2 7 15
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.