[英]Using apply or iterrows in python instead of groupby?
df = pd.DataFrame([[15000, 2015], [20000,2015], [25000,2015], [15000, 2016], [20000,2016], [25000,2016], [10500, 2017], [54000,2017], [34000,2017]], columns=['income', 'year'])
income year
15000 2015
20000 2015
25000 2015
19000 2016
36000 2016
20000 2016
10500 2017
54000 2017
34000 2017
Hello,你好,
If I have a dataframe like the one above and I want to loop through each year in python and create a median income value for each year, how would I go about it?如果我有一个像上面这样的 dataframe 并且我想在 python 中逐年循环并为每年创建一个中位数收入值,我将如何 go 呢?
Would the apply function or the groupby function be best?申请 function 或 groupby function 是最好的吗?
I can get this to work:我可以让它工作:
df.groupby(df.year)[['income']].median()
I was wondering whether there was an alternative such as apply or iterrows?我想知道是否有其他选择,例如 apply 或 iterrows?
Many thanks.非常感谢。
df.groupby
is the best way to go when you are doing certain aggregations.当您进行某些聚合时,
df.groupby
是 go 的最佳方式。
This is the right way to use it:这是使用它的正确方法:
In [85]: df.groupby('year', as_index=False)['income'].median()
Out[85]:
year income
0 2015 20000
1 2016 20000
2 2017 34000
After OP's comment:在OP发表评论后:
In [239]: res = df.groupby('year', as_index=False)['income'].median()
In [259]: d = res.set_index('year').to_dict()['income']
Then you can query the above dict
d to get mean for a certain year, like this:然后你可以查询上面的
dict
d 来获取某一年的平均值,如下所示:
In [268]: d.get(2015)
Out[268]: 20000
In [269]: d.get(2016)
Out[269]: 20000
In [270]: d.get(2017)
Out[270]: 34000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.