简体   繁体   中英

Fill empty columns with values from another column of another row based on an identifier

I am trying to fill a dataframe, containing repeated elements, based on an identifier. My Dataframe is as follows:

   Code Value
0  SJHV   
1  SJIO    96B
2  SJHV    33C
3  CPO3    22A
4  CPO3    22A
5  SJHV    33C       #< -- Numbers stored as strings
6   TOY   
7   TOY             #< -- These aren't NaN, they are empty strings

I would like to remove the empty 'Value' rows only if a non-empty 'Value' row exists. To be clear, I would want my output to look like:

   Code Value
0  SJHV    33C
1  SJIO    96B
2  CPO3    22A      
3   TOY         

My attempt was as follows:

df['Value'].replace('', np.nan, inplace=True)

df2 = df.dropna(subset=['Value']).drop_duplicates('Code')

As expected, this code also drops the 'TOY' Code. Any suggestions?

The empty strings should go to the bottom if you sort them, then you can just drop duplicates.

import pandas as pd
df = pd.DataFrame({'Code':['SJHV','SJIO','SJHV','CPO3','CPO3','SJHV','TOY','TOY'],'Value':['','96B','33C','22A','22A','33C','','']})
df = (
    df.sort_values(by=['Value'], ascending=False)
      .drop_duplicates(subset=['Code'], keep='first')
      .sort_index()
)
    

Output

   Code Value
1  SJIO   96B
2  SJHV   33C
3  CPO3   22A
6   TOY      

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM