I am using salaries.csv dataset which you find https://www.kaggle.com/kaggle/sf-salaries/data I try to find job titles that have more than 500 datapoints.After that calculate the mean TotalPayBenefits for each of the job titles. Output is that print the top-10 earning job titles.
What I did,
salaries = pd.read_csv('Salaries.csv')
salaries = salaries.drop(["Id", "Notes", "Status", "Agency"], axis = 1)
salaries = salaries.dropna()
salaries.head()
jobtitlelist = (salaries.JobTitle.value_counts()>500)[0:10]
data_10jobtitle = salaries[salaries.JobTitle.isin(jobtitlelist.index)]
avgsalary_10jobtitle = data_10jobtitle.groupby(by=data_10jobtitle.JobTitle).TotalPayBenefits.mean()
print(avgsalary_10jobtitle)
I am thinking that i miss small things which i do not find exact output.
您需要更改此行
jobtitlelist = salaries.JobTitle.value_counts()[(salaries.JobTitle.value_counts()>500)][0:10]
In this line:
jobtitlelist = (salaries.JobTitle.value_counts()>500)[0:10]
You first find jobs that have at least 500 records, then you take the top 10 jobs, which are used to compute the average total pay benefits. So your workflow is
But based on your question, your workflow should be
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.