简体   繁体   中英

fill up empty values with same value of another column in pandas dataframe

i have a pandas dataframe like the following:

在此处输入图片说明

How do I fill up the empty cells with same policy numbers for same product type as they exist?

Any suggestion would be very much appreciated. Thank you

Sorry for the confusion, I am adding my sample dataframe now:

sample=[{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M49763','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M86968','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'566D158635 ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'655D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'789D158635','PRODUCT TYPE':'TED'}]

pd.DataFrame(sample)

please note that the empty cells have " " in them too, they are not NaN across the whole dataframe

Adding to the question above. If I have the altered dataframe as above. How do i get to the following dataframe:

在此处输入图片说明

I think you need groupby + transform :

If only one same category per group and no data are empty string s :

df['POLICY NUMBER'] = (df.groupby('PRODUCT TYPE')['POLICY NUMBER']
                         .transform(lambda x: x[x != ''].iat[0]))

print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M86968          MED
1     433M86968          MED
2     433M86968          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    566D158635          TED

Or if posible there are not always empty stings, but sometimes there are wtrailing whitespaces , need strip :

df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip().groupby(df['PRODUCT TYPE'])
                                  .transform(lambda x: x[x != ''].iat[0]))

print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M86968          MED
1     433M86968          MED
2     433M86968          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    566D158635          TED

Solution with sorting and transform last value:

df['POLICY NUMBER'] = (df.sort_values(['PRODUCT TYPE','POLICY NUMBER'])
                         .groupby('PRODUCT TYPE')['POLICY NUMBER']
                         .transform('last'))
print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M86968          MED
1     433M86968          MED
2     433M86968          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    566D158635          TED

EDIT: You need replace empty strings by NaN s and then use bfill for back forward filling NaN s with ffill for forward fillin NaNs:

df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip()
                                          .replace('',np.nan)
                                          .groupby(df['PRODUCT TYPE'])
                                          .transform(lambda x: x.bfill().ffill()))

print (df)
  POLICY NUMBER PRODUCT TYPE
0     433M49763          MED
1     433M49763          MED
2     433M49763          MED
3     433M86968          MED
4    566D158635          TED
5    566D158635          TED
6    566D158635          TED
7    789D158635          TED  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM