根据pandas中的特定条件删除特定行

Question

I want to clean my data, basically I have my data -我想清理我的数据，基本上我有我的数据 -

dataframe - dataframe -

d = {'User': ['Mansi kinney', 'Mansi kinney', 'Mansi kinney', 'Alley Huff', 'Alley Huff', 'Alley Huff',  Raedden Grip', 'Raedden Grip',  'S.Sarkar',
                              'S.Sarkar', 'S.Sarkar'],
                      'Work': ['', '', '', 'College', 'College', 'College', '', '', 'Business', 'Business', 'Business'],
                      'Country': ['Aus', 'Aus', 'Australia', 'US','US', 'US', 'Ban', 'Ban',
                                 'Ind', 'Ind', 'Ind'],
                      'Dept': ['Safety', 'Safety', 'Safety', '', '', '', '', '', '', '', ''],
                      'Training': ['', 'Internal', '', '', 'External', '', '', '', '', 'Internal', ''],
                      'Status': ['', '', 'Active', '', '', 'Active', '', 'Active', '', '', '']
        }
    df = pd.DataFrame(data=d)
    df

Here I want to delete the rows where the more no.在这里我想删除更多没有的行。 Of cells are blank and the data is scattered so I want it to be in a single row and delete the unnecessary repitition of rows的单元格是空白的，数据是分散的，所以我希望它在一行中并删除不必要的行重复

My output should be-我的 output 应该是-

d = {'User':['Mansi kinney','Alley Huff','Raedden Grip', 'S.Sarkar'],
'Work': ['', 'College', '', 'Business'],
'Country': ['Aus', 'US', 'Ban',  'Ind'],
'Dept': ['Safety', '', '', ''],
                      'Training':['Internal','External', '', 'Internal'],
'Status':['Active','Active','Active', 'Active']
        }
    df = pd.DataFrame(data=d)
    df

I have typed the whole thing in my smartphone so please let me know if the question is clear or not.我已经在我的智能手机中输入了整个内容，所以请让我知道问题是否清楚。 Please help me to clean the data and get the desired output.请帮我清理数据并获得所需的 output。 Thanks in advance!!!提前致谢！！！

Answer 1

You can groupby 'User' and aggregate with ' '.join and remove duplicates with unique():您可以按 'User' 分组并使用 ' '.join 聚合并使用 unique() 删除重复项：

df = df.groupby('User').agg(lambda x: ''.join(x.unique()))
df.reset_index(inplace=True)

print(df)  
#output:
           User      Work       Country    Dept  Training  Status
0    Alley Huff   College            US          External  Active
1  Mansi kinney            AusAustralia  Safety  Internal  Active
2  Raedden Grip                     Ban                    Active
3      S.Sarkar  Business           Ind          Internal

UPDATE: Here it is with your full data更新：这是您的完整数据

import pandas as pd
df = pd.read_excel('your_data.xls')
df = df.groupby('user').agg(lambda x: ''.join(x.unique()))
df.reset_index(inplace=True)
pd.set_option('display.max_columns', None)
print(df)

#output:
                   user                                              works  \
0        Abhishek Mitra  Director & CEO | INDIAN CYBER SECURITY SOLUTIO...   
1         Anandita Kaul       HR Associate - Recruitments at Pulp Strategy   
2         Glam Sorvey.B    Data Science & Analytics with Python Consultant   
3   Madhurima S. Sarkar  MBA in Business Analytics and Finance || The S...   
4         Mansi Makhija                                    Works at Amazon   
5       NanaoSana Singh  DevOps | AWS | Docker | Kubernetes | Git | Jen...   
6         Neeraj Mishra                               DGM at Pulp Strategy   
7       Niral Shahpatel  Scale and Strategy Lead - Global Partner Marke...   
8      Sandhya Ramagiri     Technical Program Manager at Intel Corporation   
9         Sarthak Ahuja         Associate Account Manager at Pulp Strategy   
10         UDIT NARAYAN          Student at Narula Institute Of Technology   

                             country  
0        Kolkata, West Bengal, India  
1          South Delhi, Delhi, India  
2    Portland, Oregon, United States  
3        Kolkata, West Bengal, India  
4        Noida, Uttar Pradesh, India  
5           Pune, Maharashtra, India  
6                              India  
7   Hillsboro, Oregon, United States  
8    Austin, Texas Metropolitan Area  
9                       Delhi, India  
10       Domchanch, Jharkhand, India

根据pandas中的特定条件删除特定行

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-03-12 18:48:29

根据pandas中的特定条件删除特定行

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-03-12 18:48:29

解决方案1
0 已采纳 2021-03-12 18:48:29