I have a dataframe df1 and column 1 (col1) contains customer id. Col2 is filled with sales and some of the values are missing
My problem is that I want to drop duplicate customer ids in col1 only where the value of sales is missing.
I tried writing a function saying:
def drop(i):
if i[col2] == np.nan:
i.drop_duplicates(subset = 'col1')
else:
return i['col1']
I am getting an error saying truth value of series is ambiguous
Thank you for reading. Would appreciate a solution
Following should work, using groupby , apply , dropna , reset_index
assuming your data is something like this
input:
col1 col2
0 1001 2.0
1 1001 NaN
2 1002 4.0
3 1002 NaN
code:
import pandas as pd
import numpy as np
#Dummy data
data = {
'col1':[1001,1001,1002,1002],
'col2':[2,np.nan,4,np.nan],
}
df = pd.DataFrame(data)
#Solution
df.groupby('col1').apply(lambda group: group.dropna(subset=['col2'])).reset_index(drop=True)
output:
col1 col2
0 1001 2.0
1 1002 4.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.