Suppose I have a dataframe like this:
fname lname email
Joe Aaron
Joe Aaron some@some.com
Bill Smith
Bill Smith
Bill Smith some2@some.com
Is there a terse and convenient way to drop rows where {fname, lname} is duplicated and email is blank?
You should first check whether your "empty" data is NaN
or empty strings. If they are a mixture, you may need to modify the below logic.
Using pd.DataFrame.sort_values
and pd.DataFrame.drop_duplicates
:
df = df.sort_values('email')\
.drop_duplicates(['fname', 'lname'])
If your empty rows are strings, you need to specify ascending=False
when sorting:
df = df.sort_values('email', ascending=False)\
.drop_duplicates(['fname', 'lname'])
print(df)
fname lname email
4 Bill Smith some2@some.com
1 Joe Aaron some@some.com
You can using first
with groupby
(Notice replace empty with np.nan, since the first
will return the first not null value for each columns)
df.replace('',np.nan).groupby(['fname','lname']).first().reset_index()
Out[20]:
fname lname email
0 Bill Smith some2@some.com
1 Joe Aaron some@some.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.