[英]Filling NaN in DataFrame based on the values of a column
I have data that resembles the following simplified example: 我的数据类似于以下简化示例:
Col1 Col2 Col3
a A 10.1
b A NaN
d B NaN
e B 12.3
f B NaN
g C 14.1
h C NaN
i C NaN
...for many thousands of rows. ...成千上万行。 I need to fillna based upon the value in Col2, using something analogous to the ffill method.
我需要根据Col2中的值填充,使用类似于ffill方法的东西。 The result I'm looking for is this:
我正在寻找的结果是这样的:
Col1 Col2 Col3
a A 10.1
b A 10.1
d B NaN
e B 12.3
f B 12.3
g C 14.1
h C 14.1
i C 14.1
However, this method ignores the value in Col2. 但是,此方法忽略Col2中的值。 Any ideas?
有任何想法吗?
If I understand correctly then you can groupby on 'Col2' and then call transform on 'Col3' and call ffill
: 如果我理解正确,那么你可以在'Col2'上
ffill
,然后在'Col3'上调用transform并调用ffill
:
In [35]:
df['Col3'] = df.groupby('Col2')['Col3'].transform(lambda x: x.ffill())
df
Out[35]:
Col1 Col2 Col3
0 a A 10.1
1 b A 10.1
2 d B NaN
3 e B 12.3
4 f B 12.3
5 g C 14.1
6 h C 14.1
7 i C 14.1
One answer I found is the following: 我发现的一个答案如下:
df['col3'] = df.groupby('Col2').transform('fillna',method='ffill')['col3']
Any thoughts? 有什么想法吗?
Is this what you're looking for? 这是你在找什么?
import pandas as pd
import numpy as np
df['Col3'] = np.where(df['Col2'] == 'A', df['Col3'].fillna(10.1), df["Col3"])
Of course replace accordingly. 当然要相应更换。
You can take slices of the DataFrame for each element of Col2
, and then concatenate the results. 您可以为
Col2
每个元素获取DataFrame的切片,然后连接结果。
>>> pd.concat((df.loc[df.Col2 == letter, :].ffill() for letter in df.Col2.unique()))
Col1 Col2 Col3
0 a A 10.1
1 b A 10.1
2 d B NaN
3 e B 12.3
4 f B 12.3
5 g C 14.1
6 h C 14.1
7 i C 14.1
EDIT: It appears the method presented by @EdChum is the fastest by far. 编辑:看来@EdChum提出的方法是迄今为止最快的。
%timeit pd.concat((df.loc[df.Col2 == letter, :].ffill() for letter in df.Col2.unique()))
100 loops, best of 3: 3.57 ms per loop
%timeit df.groupby('Col2').transform('fillna',method='ffill')['Col3']
100 loops, best of 3: 4.59 ms per loop
%timeit df.groupby('Col2')['Col3'].transform(lambda x: x.ffill())
1000 loops, best of 3: 746 µs per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.