How to find mean with group by or pivot table in pandas dataframe?

Question

I am using salaries.csv dataset which you find https://www.kaggle.com/kaggle/sf-salaries/data I try to find job titles that have more than 500 datapoints.After that calculate the mean TotalPayBenefits for each of the job titles. Output is that print the top-10 earning job titles.

What I did,

salaries = pd.read_csv('Salaries.csv')
salaries = salaries.drop(["Id", "Notes", "Status", "Agency"], axis = 1)
salaries = salaries.dropna()
salaries.head()

jobtitlelist = (salaries.JobTitle.value_counts()>500)[0:10]
data_10jobtitle = salaries[salaries.JobTitle.isin(jobtitlelist.index)]
avgsalary_10jobtitle = data_10jobtitle.groupby(by=data_10jobtitle.JobTitle).TotalPayBenefits.mean()
print(avgsalary_10jobtitle)

My output is

I am thinking that i miss small things which i do not find exact output.

Answer 1

您需要更改此行

jobtitlelist = salaries.JobTitle.value_counts()[(salaries.JobTitle.value_counts()>500)][0:10]

Answer 2

In this line:

jobtitlelist = (salaries.JobTitle.value_counts()>500)[0:10]

You first find jobs that have at least 500 records, then you take the top 10 jobs, which are used to compute the average total pay benefits. So your workflow is

keep only job titles that have at least 500 records
take the first 10 job titles
compute average total pay

But based on your question, your workflow should be

keep only job titles that have at least 500 records
compute average total pay of jobs from step 1)
sort average total pay in ascending order
the top 10 rows of the resulted dataframe will be what you are looking for

How to find mean with group by or pivot table in pandas dataframe?

Question

2 answers

solution1
0 ACCPTED 2018-03-16 22:08:12

solution2
0 2018-03-16 22:09:34

How to find mean with group by or pivot table in pandas dataframe?

Question

2 answers

solution1 0 ACCPTED 2018-03-16 22:08:12

solution2 0 2018-03-16 22:09:34

solution1
0 ACCPTED 2018-03-16 22:08:12

solution2
0 2018-03-16 22:09:34