[英]Delete specific rows based on certain conditions in pandas
I want to clean my data, basically I have my data -我想清理我的数据,基本上我有我的数据 -
dataframe - dataframe -
d = {'User': ['Mansi kinney', 'Mansi kinney', 'Mansi kinney', 'Alley Huff', 'Alley Huff', 'Alley Huff', Raedden Grip', 'Raedden Grip', 'S.Sarkar',
'S.Sarkar', 'S.Sarkar'],
'Work': ['', '', '', 'College', 'College', 'College', '', '', 'Business', 'Business', 'Business'],
'Country': ['Aus', 'Aus', 'Australia', 'US','US', 'US', 'Ban', 'Ban',
'Ind', 'Ind', 'Ind'],
'Dept': ['Safety', 'Safety', 'Safety', '', '', '', '', '', '', '', ''],
'Training': ['', 'Internal', '', '', 'External', '', '', '', '', 'Internal', ''],
'Status': ['', '', 'Active', '', '', 'Active', '', 'Active', '', '', '']
}
df = pd.DataFrame(data=d)
df
Here I want to delete the rows where the more no.在这里我想删除更多没有的行。 Of cells are blank and the data is scattered so I want it to be in a single row and delete the unnecessary repitition of rows的单元格是空白的,数据是分散的,所以我希望它在一行中并删除不必要的行重复
My output should be-我的 output 应该是-
d = {'User':['Mansi kinney','Alley Huff','Raedden Grip', 'S.Sarkar'],
'Work': ['', 'College', '', 'Business'],
'Country': ['Aus', 'US', 'Ban', 'Ind'],
'Dept': ['Safety', '', '', ''],
'Training':['Internal','External', '', 'Internal'],
'Status':['Active','Active','Active', 'Active']
}
df = pd.DataFrame(data=d)
df
I have typed the whole thing in my smartphone so please let me know if the question is clear or not.我已经在我的智能手机中输入了整个内容,所以请让我知道问题是否清楚。 Please help me to clean the data and get the desired output.请帮我清理数据并获得所需的 output。 Thanks in advance!!!提前致谢!!!
You can groupby 'User' and aggregate with ' '.join and remove duplicates with unique():您可以按 'User' 分组并使用 ' '.join 聚合并使用 unique() 删除重复项:
df = df.groupby('User').agg(lambda x: ''.join(x.unique()))
df.reset_index(inplace=True)
print(df)
#output:
User Work Country Dept Training Status
0 Alley Huff College US External Active
1 Mansi kinney AusAustralia Safety Internal Active
2 Raedden Grip Ban Active
3 S.Sarkar Business Ind Internal
UPDATE: Here it is with your full data更新:这是您的完整数据
import pandas as pd
df = pd.read_excel('your_data.xls')
df = df.groupby('user').agg(lambda x: ''.join(x.unique()))
df.reset_index(inplace=True)
pd.set_option('display.max_columns', None)
print(df)
#output:
user works \
0 Abhishek Mitra Director & CEO | INDIAN CYBER SECURITY SOLUTIO...
1 Anandita Kaul HR Associate - Recruitments at Pulp Strategy
2 Glam Sorvey.B Data Science & Analytics with Python Consultant
3 Madhurima S. Sarkar MBA in Business Analytics and Finance || The S...
4 Mansi Makhija Works at Amazon
5 NanaoSana Singh DevOps | AWS | Docker | Kubernetes | Git | Jen...
6 Neeraj Mishra DGM at Pulp Strategy
7 Niral Shahpatel Scale and Strategy Lead - Global Partner Marke...
8 Sandhya Ramagiri Technical Program Manager at Intel Corporation
9 Sarthak Ahuja Associate Account Manager at Pulp Strategy
10 UDIT NARAYAN Student at Narula Institute Of Technology
country
0 Kolkata, West Bengal, India
1 South Delhi, Delhi, India
2 Portland, Oregon, United States
3 Kolkata, West Bengal, India
4 Noida, Uttar Pradesh, India
5 Pune, Maharashtra, India
6 India
7 Hillsboro, Oregon, United States
8 Austin, Texas Metropolitan Area
9 Delhi, India
10 Domchanch, Jharkhand, India
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.