[英]drop duplicates dataframe pandas
我有一个 dataframe ,我想将所有“小时”(列标题)总结为 1 个“经理”(列标题)下每个“名称”(列标题)的“总和”。 然后我想删除所有重复项,然后根据总小时数对 dataframe 进行排序并逐行打印。 但是,我一直在逐行打印出 Manager 的副本?
|---------------------|------------------|---------------------|------------------|
| Department | Name | Manager | Hours |
|---------------------|------------------|---------------------|------------------|
| Department name | person Name | Manager Name |no of hours |
|---------------------|------------------|---------------------|------------------|
def total_group(csv_file):
df = pd.read_csv(csv_file)
df['Total Hours'] = df.groupby(['Manager'])['Hours'].transform('sum')
new_df = df.drop_duplicates(subset=['Department', 'name', 'Manager']).sort_values('Total Hours')
for index, row in new_df.iterrows():
manager_value = row['Manager']
total_hours = row['Total Hours']
print("manager: {}, has: {} Total hours".format(manager_value, total_hours))
print(total_group(csv_file))
Dataframe 打印
df1 = df['Total Hours'] = df.groupby(['Direct Manager'])['Labor Hours'].transform('sum')
print(df1)
结果
0 450.0
1 450.0
2 450.0
3 450.0
4 450.0
...
43929 320.5
43930 320.5
43931 320.5
43932 320.5
43933 320.5
Name: Hours, Length: 43934, dtype: float64
新 dataframe 打印:
new_df = df.drop_duplicates(subset=['Department', 'Direct Manager']).sort_values('Total Hours')
print(new_df)
结果:
Department Name Hours Total Hours
9554 Europe Dri, Bas ... 8.0 72.000000
34498 Product & Design Sun, Sunn ... 5.0 81.000000
19140 Product & Design Oers, Len ... 8.0 122.000000
我想要的是这样的 dataframe:
Department Manager Total Hours
9554 Europe Last, First ... 72.000000
34498 Product Last, first ... 81.000000
19140 Design Last, First ... 122.000000
你想试试这个
df.groupby('Manager').agg({'Hours':['sum','count']}).sort_values(('Hours','sum'), ascending=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.