[英]Pandas Dataframe fillna() using other known column values
Given the following sample df
: 给定以下样本
df
:
Other1 Other2 Name Value
0 0 1 Johnson C
1 0 0 Johnson NaN
2 1 1 Smith R
3 1 1 Smith NaN
4 0 1 Jackson X
5 1 1 Jackson NaN
6 1 1 Jackson NaN
I want to be able to fill the NaN
values with the df['Value']
value associated with the given name in that row. 我希望能够用与该行中给定名称关联的
df['Value']
值填充NaN
值。 My desired outcome is the following, which I know can be achieved like so: 我期望的结果如下,我知道可以这样实现:
df['Value'] = df['Value'].fillna(method='ffill')
Other1 Other2 Name Value
0 0 1 Johnson C
1 0 0 Johnson C
2 1 1 Smith R
3 1 1 Smith R
4 0 1 Jackson X
5 1 1 Jackson X
6 1 1 Jackson X
However, this solution will not achieve the desired result if the names are not followed by one another in order. 但是,如果名称后面没有顺序排列,则此解决方案将无法获得理想的结果。 I also cannot sort by
df['Name']
, as the order is important. 我也不能按
df['Name']
排序,因为顺序很重要。 Is there an efficient means of simply filling a given NaN
value by it's associated name value and assigning it to that? 是否有一种有效的方法,可以简单地通过关联的名称值填充给定的
NaN
值并将其分配给该值?
It's also important to note that a given Name will always only have a single value associated with it. 同样重要的是要注意,给定的名称将始终仅具有与之关联的单个值。 Thank you in advance.
先感谢您。
You should use groupby
and transform
: 您应该使用
groupby
并进行transform
:
df['Value'] = df.groupby('Name')['Value'].transform('first')
df
Other1 Other2 Name Value
0 0 1 Johnson C
1 0 0 Johnson C
2 1 1 Smith R
3 1 1 Smith R
4 0 1 Jackson X
5 1 1 Jackson X
6 1 1 Jackson X
Peter's answer is not correct because the first valid value may not always be the first in the group, in which case ffill
will pollute the next group with the previous group's value. Peter的答案是不正确的,因为第一个有效值可能并不总是组中的第一个有效值,在这种情况下,
ffill
将污染前一组值的下一个组。
ALollz's answer is fine, but dropna
incurs some degree of overhead. ALollz的回答很好,但是
dropna
会产生一定程度的开销。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.