简体   繁体   English

熊猫数据框分组汇总

[英]Pandas dataframe groupby with aggregation

I have a Pandas dataframe with thousands of rows, and these cols: 我有一个带有数千行的Pandas数据框,这些列是:

Name    Job   Department   Salary    Date 

I want to return a new df with two cols: 我想返回一个带有两个列的新df:

Unique_Job     Avg_Salary

The code I use to accomplish this: 我用来完成此操作的代码:

jobs = df.groupby(['Job'])
dict = {}
for a,b in jobs:
    dict.update({a: b['Salary'].mean()})
dfJobs = pd.DataFrame(dict.items(), columns=['Unique_Job', 'Avg Salary'])

However, I know there must be a better way. 但是,我知道必须有更好的方法。 Ideas? 想法? Thanks. 谢谢。

Yes, use the aggregate method of the groupby object. 是的,使用groupby对象的aggregate方法。

jobs = df.groupby('Job').aggregate({'Salary': 'mean'})

There's even the mean method as shortcut: 甚至还有平均值方法作为快捷方式:

jobs = df.groupby('Job')['Salary'].mean()

See http://pandas.pydata.org/pandas-docs/stable/groupby.html for more info and lots of examples 有关更多信息和大量示例,请参见http://pandas.pydata.org/pandas-docs/stable/groupby.html

As you already have the means, I guess you struggle with making the new dataframe from the series, you get as the output. 正如您已经掌握的方法一样,我想您很难从该系列中制作新的数据框,您将获得输出。 You can use Series.to_frame() and DataFrame.reset_index() methods to make the dataframe with two columns and then you only rename the columns. 您可以使用Series.to_frame()DataFrame.reset_index()方法使数据DataFrame.reset_index()具有两列,然后仅重命名这些列。 Like this: 像这样:

jobs = df.groupby('Job')['Salary'].mean()
jobs = jobs.to_frame().reset_index()
jobs.columns = ['Unique_Job', 'Avg_Salary']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM