I am trying to fill a dataframe, containing repeated elements, based on an identifier. My Dataframe is as follows:
Code Value
0 SJHV
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
4 CPO3 22A
5 SJHV 33C #< -- Numbers stored as strings
6 TOY
7 TOY #< -- These aren't NaN, they are empty strings
I would like to remove the empty 'Value' rows only if a non-empty 'Value' row exists. To be clear, I would want my output to look like:
Code Value
0 SJHV 33C
1 SJIO 96B
2 CPO3 22A
3 TOY
My attempt was as follows:
df['Value'].replace('', np.nan, inplace=True)
df2 = df.dropna(subset=['Value']).drop_duplicates('Code')
As expected, this code also drops the 'TOY' Code. Any suggestions?
The empty strings should go to the bottom if you sort them, then you can just drop duplicates.
import pandas as pd
df = pd.DataFrame({'Code':['SJHV','SJIO','SJHV','CPO3','CPO3','SJHV','TOY','TOY'],'Value':['','96B','33C','22A','22A','33C','','']})
df = (
df.sort_values(by=['Value'], ascending=False)
.drop_duplicates(subset=['Code'], keep='first')
.sort_index()
)
Output
Code Value
1 SJIO 96B
2 SJHV 33C
3 CPO3 22A
6 TOY
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.