繁体   English   中英

删除重复 dataframe pandas

[英]drop duplicates dataframe pandas

我有一个 dataframe ,我想将所有“小时”(列标题)总结为 1 个“经理”(列标题)下每个“名称”(列标题)的“总和”。 然后我想删除所有重复项,然后根据总小时数对 dataframe 进行排序并逐行打印。 但是,我一直在逐行打印出 Manager 的副本?

|---------------------|------------------|---------------------|------------------|
|      Department     |     Name         | Manager             | Hours            | 
|---------------------|------------------|---------------------|------------------|
|   Department name   |     person Name  | Manager Name        |no of hours       |
|---------------------|------------------|---------------------|------------------|
def total_group(csv_file):
    df = pd.read_csv(csv_file)
    df['Total Hours'] = df.groupby(['Manager'])['Hours'].transform('sum')
    new_df = df.drop_duplicates(subset=['Department', 'name', 'Manager']).sort_values('Total Hours')
    for index, row in new_df.iterrows():
        manager_value = row['Manager']
        total_hours = row['Total Hours']
        print("manager: {}, has: {} Total hours".format(manager_value, total_hours))


print(total_group(csv_file))

Dataframe 打印

df1 = df['Total Hours'] = df.groupby(['Direct Manager'])['Labor Hours'].transform('sum')
    print(df1)

结果

0        450.0
1        450.0
2        450.0
3        450.0
4        450.0
         ...  
43929    320.5
43930    320.5
43931    320.5
43932    320.5
43933    320.5
Name: Hours, Length: 43934, dtype: float64

新 dataframe 打印:

new_df = df.drop_duplicates(subset=['Department', 'Direct Manager']).sort_values('Total Hours')
    print(new_df)

结果:

                     Department              Name                Hours                   Total Hours
9554             Europe                     Dri, Bas ...         8.0                        72.000000
34498           Product & Design    Sun, Sunn  ...     5.0                        81.000000
19140           Product & Design    Oers, Len  ...      8.0                        122.000000

我想要的是这样的 dataframe:

                     Department              Manager                                Total Hours
9554             Europe                     Last, First ...                             72.000000
34498           Product                    Last, first  ...                         81.000000
19140           Design                     Last, First  ...                          122.000000

你想试试这个

df.groupby('Manager').agg({'Hours':['sum','count']}).sort_values(('Hours','sum'), ascending=False)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM