[英]Drop duplicates of a column where null value is present
I have a dataframe df1 and column 1 (col1) contains customer id.我有一个 dataframe df1 并且第 1 列(col1)包含客户 ID。 Col2 is filled with sales and some of the values are missing Col2 填充了销售额,并且缺少一些值
My problem is that I want to drop duplicate customer ids in col1 only where the value of sales is missing.我的问题是,我只想在缺少销售价值的地方删除 col1 中的重复客户 ID。
I tried writing a function saying:我试着写一个 function 说:
def drop(i):
if i[col2] == np.nan:
i.drop_duplicates(subset = 'col1')
else:
return i['col1']
I am getting an error saying truth value of series is ambiguous我收到一个错误,说系列的真值不明确
Thank you for reading.感谢您的阅读。 Would appreciate a solution将不胜感激一个解决方案
Following should work, using groupby , apply , dropna , reset_index以下应该工作,使用groupby , apply , dropna , reset_index
assuming your data is something like this假设您的数据是这样的
input:输入:
col1 col2
0 1001 2.0
1 1001 NaN
2 1002 4.0
3 1002 NaN
code:代码:
import pandas as pd
import numpy as np
#Dummy data
data = {
'col1':[1001,1001,1002,1002],
'col2':[2,np.nan,4,np.nan],
}
df = pd.DataFrame(data)
#Solution
df.groupby('col1').apply(lambda group: group.dropna(subset=['col2'])).reset_index(drop=True)
output: output:
col1 col2
0 1001 2.0
1 1002 4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.