Dropping selected rows in Pandas with duplicated columns

Question

Suppose I have a dataframe like this:

fname    lname     email

Joe      Aaron   
Joe      Aaron     some@some.com
Bill     Smith 
Bill     Smith
Bill     Smith     some2@some.com

Is there a terse and convenient way to drop rows where {fname, lname} is duplicated and email is blank?

Answer 1

You should first check whether your "empty" data is NaN or empty strings. If they are a mixture, you may need to modify the below logic.

If empty rows are NaN

Using pd.DataFrame.sort_values and pd.DataFrame.drop_duplicates :

df = df.sort_values('email')\
       .drop_duplicates(['fname', 'lname'])

If empty rows are strings

If your empty rows are strings, you need to specify ascending=False when sorting:

df = df.sort_values('email', ascending=False)\
       .drop_duplicates(['fname', 'lname'])

Result

print(df)

  fname  lname           email
4  Bill  Smith  some2@some.com
1   Joe  Aaron   some@some.com

Answer 2

You can using first with groupby (Notice replace empty with np.nan, since the first will return the first not null value for each columns)

df.replace('',np.nan).groupby(['fname','lname']).first().reset_index()
Out[20]: 
  fname  lname           email
0  Bill  Smith  some2@some.com
1   Joe  Aaron   some@some.com

Dropping selected rows in Pandas with duplicated columns

Question

2 answers

solution1
1 ACCPTED 2018-06-14 00:11:00

If empty rows are NaN

If empty rows are strings

Result

solution2
0 2018-06-14 01:30:34

Dropping selected rows in Pandas with duplicated columns

Question

2 answers

solution1 1 ACCPTED 2018-06-14 00:11:00

If empty rows are NaN

If empty rows are strings

Result

solution2 0 2018-06-14 01:30:34

solution1
1 ACCPTED 2018-06-14 00:11:00

solution2
0 2018-06-14 01:30:34