简体   繁体   English

Pandas - 用另一列的字符串填充 NaN

[英]Pandas - fill NaN with string from another column

I have 2 columns ('Surname' and 'PostCode').我有 2 列(“姓氏”和“邮政编码”)。 The dataframe is already filtered to include only duplicated surnames: dataframe 已过滤为仅包含重复的姓氏:

Surname | PostCode
Adams   | NaN
Adams   | NaN
Bryan   | NX203
Bryan   | NaN
Cormack | NaN
Cormack | NaN
Cormack | NZ233
Dylan   | NaN
Dylan   | NaN
Dylan   | NaN

Some of them do not have post codes at all.其中一些根本没有邮政编码。 For those that have, however, I'd like to fill in the missing ones with whatever there is.然而,对于那些有的人,我想用现有的任何东西来填补缺失的部分。 For example the second row containing 'Bryan' should be filled with NX203 (just like the row above).例如,包含“Bryan”的第二行应填充 NX203(就像上面的行一样)。 Similarly, the other two instances of Cormack should be filled with NZ233.类似地,其他两个 Cormack 实例应填充 NZ233。

I have no idea where to start.我不知道从哪里开始。 I assume it'd have to be a python function applied to each row but not sure how to start/what to do.我假设它必须是应用于每一行的 python function 但不确定如何开始/做什么。

Let's try groupby().transform() :让我们试试groupby().transform()

df['PostCode'] = df.groupby('Surname').PostCode.transform('first')

Output: Output:

   Surname PostCode
0    Adams      NaN
1    Adams      NaN
2    Bryan    NX203
3    Bryan    NX203
4  Cormack    NZ233
5  Cormack    NZ233
6  Cormack    NZ233
7    Dylan      NaN
8    Dylan      NaN
9    Dylan      NaN

Another way, groupby(), ffill and then bfill另一种方式, groupby(), ffillbfill

df['PostCode'] =df.groupby('Surname').PostCode.apply(lambda x:x.ffill().bfill())



 Surname PostCode
0    Adams      NaN
1    Adams      NaN
2    Bryan    NX203
3    Bryan    NX203
4  Cormack    NZ233
5  Cormack    NZ233
6  Cormack    NZ233
7    Dylan      NaN
8    Dylan      NaN
9    Dylan      NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM