[英]fill up empty values with same value of another column in pandas dataframe
i have a pandas dataframe like the following: 我有一个如下所示的熊猫数据框:
How do I fill up the empty cells with same policy numbers for same product type as they exist? 如何为空单元格填充相同产品类型的相同策略编号?
Any suggestion would be very much appreciated. 任何建议将不胜感激。 Thank you 谢谢
Sorry for the confusion, I am adding my sample dataframe now: 抱歉,我现在要添加示例数据框:
sample=[{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M49763','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M86968','PRODUCT TYPE':'MED'},{'POLICY NUMBER':' ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'566D158635 ','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'655D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'789D158635','PRODUCT TYPE':'TED'}] sample = [{'POLICY NUMBER':'','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M49763' ,'PRODUCT TYPE':'MED'},{'POLICY NUMBER':'433M86968','PRODUCT TYPE':'MED'},{'POLICY NUMBER':'','PRODUCT TYPE':'TED'}, {'POLICY NUMBER':'566D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'655D158635','PRODUCT TYPE':'TED'},{'POLICY NUMBER':'789D158635', 'PRODUCT TYPE':'TED'}]
pd.DataFrame(sample) pd.DataFrame(样本)
please note that the empty cells have " " in them too, they are not NaN across the whole dataframe 请注意,空单元格中也包含“”,它们在整个数据框中都不是NaN
Adding to the question above. 添加到上面的问题。 If I have the altered dataframe as above. 如果我有如上所述的更改的数据帧。 How do i get to the following dataframe: 我如何到达以下数据框:
I think you need groupby
+ transform
: 我认为你需要groupby
+ transform
:
If only one same category per group and no data are empty string
s : 如果每个组只有一个相同类别并且没有数据,则为空string
s:
df['POLICY NUMBER'] = (df.groupby('PRODUCT TYPE')['POLICY NUMBER']
.transform(lambda x: x[x != ''].iat[0]))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M86968 MED
1 433M86968 MED
2 433M86968 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 566D158635 TED
Or if posible there are not always empty stings, but sometimes there are wtrailing whitespaces
, need strip
: 或者,如果可能的话,并非总是空wtrailing whitespaces
,但有时会有wtrailing whitespaces
,需要使用strip
:
df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip().groupby(df['PRODUCT TYPE'])
.transform(lambda x: x[x != ''].iat[0]))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M86968 MED
1 433M86968 MED
2 433M86968 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 566D158635 TED
Solution with sorting and transform last
value: 排序和转换last
值的解决方案:
df['POLICY NUMBER'] = (df.sort_values(['PRODUCT TYPE','POLICY NUMBER'])
.groupby('PRODUCT TYPE')['POLICY NUMBER']
.transform('last'))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M86968 MED
1 433M86968 MED
2 433M86968 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 566D158635 TED
EDIT: You need replace empty strings by NaN
s and then use bfill
for back forward filling NaN
s with ffill
for forward fillin NaNs: 编辑:您需要用NaN
替换空字符串,然后使用bfill
进行反向向前填充NaN
,而ffill
进行向前填充NaNs:
df['POLICY NUMBER'] = (df['POLICY NUMBER'].str.strip()
.replace('',np.nan)
.groupby(df['PRODUCT TYPE'])
.transform(lambda x: x.bfill().ffill()))
print (df)
POLICY NUMBER PRODUCT TYPE
0 433M49763 MED
1 433M49763 MED
2 433M49763 MED
3 433M86968 MED
4 566D158635 TED
5 566D158635 TED
6 566D158635 TED
7 789D158635 TED
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.