在熊猫中删除具有重复列的选定行

Question

Suppose I have a dataframe like this: 假设我有一个像这样的数据框：

fname    lname     email

Joe      Aaron   
Joe      Aaron     some@some.com
Bill     Smith 
Bill     Smith
Bill     Smith     some2@some.com

Is there a terse and convenient way to drop rows where {fname, lname} is duplicated and email is blank? 有没有简洁方便的方法来删除{fname，lname}重复且电子邮件为空白的行？

Answer 1

You should first check whether your "empty" data is NaN or empty strings. 您应该首先检查您的“空”数据是NaN还是空字符串。 If they are a mixture, you may need to modify the below logic. 如果它们是混合的，则可能需要修改以下逻辑。

If empty rows are NaN 如果空行是NaN

Using pd.DataFrame.sort_values and pd.DataFrame.drop_duplicates : 使用pd.DataFrame.sort_values和pd.DataFrame.drop_duplicates ：

df = df.sort_values('email')\
       .drop_duplicates(['fname', 'lname'])

If empty rows are strings 如果空行是字符串

If your empty rows are strings, you need to specify ascending=False when sorting: 如果空行是字符串，则在排序时需要指定ascending=False ：

df = df.sort_values('email', ascending=False)\
       .drop_duplicates(['fname', 'lname'])

Result 结果

print(df)

  fname  lname           email
4  Bill  Smith  some2@some.com
1   Joe  Aaron   some@some.com

Answer 2

You can using first with groupby (Notice replace empty with np.nan, since the first will return the first not null value for each columns) 您可以将first与groupby一起使用（注意，请用np.nan替换为空，因为first将返回每列的第一个非null值）

df.replace('',np.nan).groupby(['fname','lname']).first().reset_index()
Out[20]: 
  fname  lname           email
0  Bill  Smith  some2@some.com
1   Joe  Aaron   some@some.com

在熊猫中删除具有重复列的选定行

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-06-14 00:11:00

If empty rows are NaN 如果空行是NaN

If empty rows are strings 如果空行是字符串

Result 结果

解决方案2
0 2018-06-14 01:30:34

在熊猫中删除具有重复列的选定行

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-06-14 00:11:00

If empty rows are NaN 如果空行是NaN

If empty rows are strings 如果空行是字符串

Result 结果

解决方案2 0 2018-06-14 01:30:34

解决方案1
1 已采纳 2018-06-14 00:11:00

解决方案2
0 2018-06-14 01:30:34