简体   繁体   English

在 python 中使用 apply 或 iterrows 而不是 groupby?

[英]Using apply or iterrows in python instead of groupby?

df = pd.DataFrame([[15000, 2015], [20000,2015], [25000,2015], [15000, 2016], [20000,2016], [25000,2016], [10500, 2017], [54000,2017], [34000,2017]], columns=['income', 'year'])


income          year
15000           2015
20000           2015
25000           2015
19000           2016
36000           2016
20000           2016
10500           2017
54000           2017
34000           2017

Hello,你好,

If I have a dataframe like the one above and I want to loop through each year in python and create a median income value for each year, how would I go about it?如果我有一个像上面这样的 dataframe 并且我想在 python 中逐年循环并为每年创建一个中位数收入值,我将如何 go 呢?

Would the apply function or the groupby function be best?申请 function 或 groupby function 是最好的吗?

I can get this to work:我可以让它工作:

df.groupby(df.year)[['income']].median()

I was wondering whether there was an alternative such as apply or iterrows?我想知道是否有其他选择,例如 apply 或 iterrows?

Many thanks.非常感谢。

df.groupby is the best way to go when you are doing certain aggregations.当您进行某些聚合时, df.groupby是 go 的最佳方式。

This is the right way to use it:这是使用它的正确方法:

In [85]: df.groupby('year', as_index=False)['income'].median()
Out[85]: 
   year  income
0  2015   20000
1  2016   20000
2  2017   34000

After OP's comment:在OP发表评论后:

In [239]: res = df.groupby('year', as_index=False)['income'].median()
In [259]: d = res.set_index('year').to_dict()['income']

Then you can query the above dict d to get mean for a certain year, like this:然后你可以查询上面的dict d 来获取某一年的平均值,如下所示:

In [268]: d.get(2015)
Out[268]: 20000

In [269]: d.get(2016) 
Out[269]: 20000

In [270]: d.get(2017) 
Out[270]: 34000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM