熊猫sort_values

Question

While doing data analysis on SF Salaries dataset from Kaggle ( https://www.kaggle.com/kaggle/sf-salaries ), I would like to know the ranking of overtime pay based on Year and JobTitle. 在从Kaggle（ https://www.kaggle.com/kaggle/sf-salaries ）对SF Salaries数据集进行数据分析时，我想知道根据Year和JobTitle得出的加班费排名。

What I decided to get 我决定得到什么

My solution was: 我的解决方案是：

df = df[['Year','JobTitle','OvertimePay']].copy()
df2 = df.sort_values('OvertimePay', ascending= False)

which turned out to be like this . 原来是这样的。 Obviously, it didn't turn out as I expected. 显然，结果没有达到我的预期。 Besides index, it seems sorted inappropriately since 173547.73 should be followed by 163477.81, etc. Please help. 除索引外，它似乎排序不当，因为173547.73后应跟163477.81，等等。请提供帮助。 Thank you. 谢谢。

Answer 1

I am not sure you've realized that each line corresponds to a different Employee. 我不确定您是否意识到每一行都对应一个不同的Employee。 So when you do df = df[['Year','JobTitle','OvertimePay']].copy() , there are multiple occurrences of "Deputy Sheriff" in the same year, one for each employee. 因此，当您执行df = df[['Year','JobTitle','OvertimePay']].copy() ，同一年中会多次出现“副警长”，每位员工一名。 This can happen several times, because there are different employees with the same "JobTitle". 这可能会发生多次，因为有不同的员工具有相同的“ JobTitle”。

In order to achieve what you want, you could drop_duplicates and get only the high paid employees for each "Job Title" in a "Year". 为了实现您想要的目标，您可以drop_duplicates并只获取“年度”中每个“职位”的高薪员工。 However, I advise you to analyze if this is really what you are looking for. 但是，我建议您分析一下这是否是您真正想要的。

Here is the code I would use: 这是我要使用的代码：

import numpy as np
import pandas as pd

df = pd.read_csv('Salaries.csv')
df['OvertimePay'] = df['OvertimePay'].replace("Not Provided",np.nan).astype(float)
df = df[['Year','JobTitle','OvertimePay']].copy()
df.drop_duplicates(subset=['Year','JobTitle'])
df2 = df.sort_values('OvertimePay', ascending= False)

EDIT : To change the format I would use something like: 编辑：要更改格式，我将使用类似：

print(df2.iloc[0:20,].to_string(header=['Year','JobTitle',''],index=False,justify='left',
                                formatters={'JobTitle':'{{:<{}s}}'.format(df2['JobTitle'].str.len().max()).format}))

熊猫sort_values

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-03-15 22:55:08

熊猫sort_values

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-03-15 22:55:08

解决方案1
0 已采纳 2019-03-15 22:55:08