[英]Pandas dataframe groupby with aggregation
I have a Pandas dataframe with thousands of rows, and these cols: 我有一个带有数千行的Pandas数据框,这些列是:
Name Job Department Salary Date
I want to return a new df with two cols: 我想返回一个带有两个列的新df:
Unique_Job Avg_Salary
The code I use to accomplish this: 我用来完成此操作的代码:
jobs = df.groupby(['Job'])
dict = {}
for a,b in jobs:
dict.update({a: b['Salary'].mean()})
dfJobs = pd.DataFrame(dict.items(), columns=['Unique_Job', 'Avg Salary'])
However, I know there must be a better way. 但是,我知道必须有更好的方法。 Ideas? 想法? Thanks. 谢谢。
Yes, use the aggregate
method of the groupby
object. 是的,使用groupby
对象的aggregate
方法。
jobs = df.groupby('Job').aggregate({'Salary': 'mean'})
There's even the mean method as shortcut: 甚至还有平均值方法作为快捷方式:
jobs = df.groupby('Job')['Salary'].mean()
See http://pandas.pydata.org/pandas-docs/stable/groupby.html for more info and lots of examples 有关更多信息和大量示例,请参见http://pandas.pydata.org/pandas-docs/stable/groupby.html 。
As you already have the means, I guess you struggle with making the new dataframe from the series, you get as the output. 正如您已经掌握的方法一样,我想您很难从该系列中制作新的数据框,您将获得输出。 You can use Series.to_frame()
and DataFrame.reset_index()
methods to make the dataframe with two columns and then you only rename the columns. 您可以使用Series.to_frame()
和DataFrame.reset_index()
方法使数据DataFrame.reset_index()
具有两列,然后仅重命名这些列。 Like this: 像这样:
jobs = df.groupby('Job')['Salary'].mean()
jobs = jobs.to_frame().reset_index()
jobs.columns = ['Unique_Job', 'Avg_Salary']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.