熊猫数据框分组汇总

Question

I have a Pandas dataframe with thousands of rows, and these cols: 我有一个带有数千行的Pandas数据框，这些列是：

Name    Job   Department   Salary    Date

I want to return a new df with two cols: 我想返回一个带有两个列的新df：

Unique_Job     Avg_Salary

The code I use to accomplish this: 我用来完成此操作的代码：

jobs = df.groupby(['Job'])
dict = {}
for a,b in jobs:
    dict.update({a: b['Salary'].mean()})
dfJobs = pd.DataFrame(dict.items(), columns=['Unique_Job', 'Avg Salary'])

However, I know there must be a better way. 但是，我知道必须有更好的方法。 Ideas? 想法？ Thanks. 谢谢。

Answer 1

Yes, use the aggregate method of the groupby object. 是的，使用groupby对象的aggregate方法。

jobs = df.groupby('Job').aggregate({'Salary': 'mean'})

There's even the mean method as shortcut: 甚至还有平均值方法作为快捷方式：

jobs = df.groupby('Job')['Salary'].mean()

See http://pandas.pydata.org/pandas-docs/stable/groupby.html for more info and lots of examples 有关更多信息和大量示例，请参见http://pandas.pydata.org/pandas-docs/stable/groupby.html 。

Answer 2

As you already have the means, I guess you struggle with making the new dataframe from the series, you get as the output. 正如您已经掌握的方法一样，我想您很难从该系列中制作新的数据框，您将获得输出。 You can use Series.to_frame() and DataFrame.reset_index() methods to make the dataframe with two columns and then you only rename the columns. 您可以使用Series.to_frame()和DataFrame.reset_index()方法使数据DataFrame.reset_index()具有两列，然后仅重命名这些列。 Like this: 像这样：

jobs = df.groupby('Job')['Salary'].mean()
jobs = jobs.to_frame().reset_index()
jobs.columns = ['Unique_Job', 'Avg_Salary']

熊猫数据框分组汇总

问题描述

2 个解决方案

解决方案1
2 2016-02-13 22:30:33

解决方案2
1 已采纳 2016-02-13 22:42:00

熊猫数据框分组汇总

问题描述

2 个解决方案

解决方案1 2 2016-02-13 22:30:33

解决方案2 1 已采纳 2016-02-13 22:42:00

解决方案1
2 2016-02-13 22:30:33

解决方案2
1 已采纳 2016-02-13 22:42:00