[英]Pandas - fill NaN with string from another column
I have 2 columns ('Surname' and 'PostCode').我有 2 列(“姓氏”和“邮政编码”)。 The dataframe is already filtered to include only duplicated surnames:
dataframe 已过滤为仅包含重复的姓氏:
Surname | PostCode
Adams | NaN
Adams | NaN
Bryan | NX203
Bryan | NaN
Cormack | NaN
Cormack | NaN
Cormack | NZ233
Dylan | NaN
Dylan | NaN
Dylan | NaN
Some of them do not have post codes at all.其中一些根本没有邮政编码。 For those that have, however, I'd like to fill in the missing ones with whatever there is.
然而,对于那些有的人,我想用现有的任何东西来填补缺失的部分。 For example the second row containing 'Bryan' should be filled with NX203 (just like the row above).
例如,包含“Bryan”的第二行应填充 NX203(就像上面的行一样)。 Similarly, the other two instances of Cormack should be filled with NZ233.
类似地,其他两个 Cormack 实例应填充 NZ233。
I have no idea where to start.我不知道从哪里开始。 I assume it'd have to be a python function applied to each row but not sure how to start/what to do.
我假设它必须是应用于每一行的 python function 但不确定如何开始/做什么。
Let's try groupby().transform()
:让我们试试
groupby().transform()
:
df['PostCode'] = df.groupby('Surname').PostCode.transform('first')
Output: Output:
Surname PostCode
0 Adams NaN
1 Adams NaN
2 Bryan NX203
3 Bryan NX203
4 Cormack NZ233
5 Cormack NZ233
6 Cormack NZ233
7 Dylan NaN
8 Dylan NaN
9 Dylan NaN
Another way, groupby(), ffill
and then bfill
另一种方式,
groupby(), ffill
再bfill
df['PostCode'] =df.groupby('Surname').PostCode.apply(lambda x:x.ffill().bfill())
Surname PostCode
0 Adams NaN
1 Adams NaN
2 Bryan NX203
3 Bryan NX203
4 Cormack NZ233
5 Cormack NZ233
6 Cormack NZ233
7 Dylan NaN
8 Dylan NaN
9 Dylan NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.