在 python 中使用 apply 或 iterrows 而不是 groupby？

Question

df = pd.DataFrame([[15000, 2015], [20000,2015], [25000,2015], [15000, 2016], [20000,2016], [25000,2016], [10500, 2017], [54000,2017], [34000,2017]], columns=['income', 'year'])


income          year
15000           2015
20000           2015
25000           2015
19000           2016
36000           2016
20000           2016
10500           2017
54000           2017
34000           2017

Hello,你好，

If I have a dataframe like the one above and I want to loop through each year in python and create a median income value for each year, how would I go about it?如果我有一个像上面这样的 dataframe 并且我想在 python 中逐年循环并为每年创建一个中位数收入值，我将如何 go 呢？

Would the apply function or the groupby function be best?申请 function 或 groupby function 是最好的吗？

I can get this to work:我可以让它工作：

df.groupby(df.year)[['income']].median()

I was wondering whether there was an alternative such as apply or iterrows?我想知道是否有其他选择，例如 apply 或 iterrows？

Many thanks.非常感谢。

Answer 1

df.groupby is the best way to go when you are doing certain aggregations.当您进行某些聚合时， df.groupby是 go 的最佳方式。

This is the right way to use it:这是使用它的正确方法：

In [85]: df.groupby('year', as_index=False)['income'].median()
Out[85]: 
   year  income
0  2015   20000
1  2016   20000
2  2017   34000

After OP's comment:在OP发表评论后：

In [239]: res = df.groupby('year', as_index=False)['income'].median()
In [259]: d = res.set_index('year').to_dict()['income']

Then you can query the above dict d to get mean for a certain year, like this:然后你可以查询上面的dict d 来获取某一年的平均值，如下所示：

In [268]: d.get(2015)
Out[268]: 20000

In [269]: d.get(2016) 
Out[269]: 20000

In [270]: d.get(2017) 
Out[270]: 34000

在 python 中使用 apply 或 iterrows 而不是 groupby？

问题描述

1 个解决方案

解决方案1
1 2020-05-28 15:07:51

在 python 中使用 apply 或 iterrows 而不是 groupby？

问题描述

1 个解决方案

解决方案1 1 2020-05-28 15:07:51

解决方案1
1 2020-05-28 15:07:51