根据列的值在DataFrame中填充NaN

Question

I have data that resembles the following simplified example: 我的数据类似于以下简化示例：

Col1    Col2    Col3
a       A       10.1
b       A       NaN
d       B       NaN
e       B       12.3    
f       B       NaN
g       C       14.1
h       C       NaN
i       C       NaN

...for many thousands of rows. ...成千上万行。 I need to fillna based upon the value in Col2, using something analogous to the ffill method. 我需要根据Col2中的值填充，使用类似于ffill方法的东西。 The result I'm looking for is this: 我正在寻找的结果是这样的：

Col1    Col2    Col3
a       A       10.1
b       A       10.1
d       B       NaN
e       B       12.3    
f       B       12.3
g       C       14.1
h       C       14.1
i       C       14.1

However, this method ignores the value in Col2. 但是，此方法忽略Col2中的值。 Any ideas? 有任何想法吗？

Answer 1

If I understand correctly then you can groupby on 'Col2' and then call transform on 'Col3' and call ffill : 如果我理解正确，那么你可以在'Col2'上ffill ，然后在'Col3'上调用transform并调用ffill ：

In [35]:

df['Col3'] = df.groupby('Col2')['Col3'].transform(lambda x: x.ffill())
df
Out[35]:
  Col1 Col2  Col3
0    a    A  10.1
1    b    A  10.1
2    d    B   NaN
3    e    B  12.3
4    f    B  12.3
5    g    C  14.1
6    h    C  14.1
7    i    C  14.1

Answer 2

One answer I found is the following: 我发现的一个答案如下：

df['col3'] = df.groupby('Col2').transform('fillna',method='ffill')['col3']

Any thoughts? 有什么想法吗？

Answer 3

Is this what you're looking for? 这是你在找什么？

import pandas as pd
import numpy as np


df['Col3'] = np.where(df['Col2'] == 'A', df['Col3'].fillna(10.1), df["Col3"])

Of course replace accordingly. 当然要相应更换。

Answer 4

You can take slices of the DataFrame for each element of Col2 , and then concatenate the results. 您可以为Col2每个元素获取DataFrame的切片，然后连接结果。

>>> pd.concat((df.loc[df.Col2 == letter, :].ffill() for letter in df.Col2.unique()))

  Col1 Col2  Col3
0    a    A  10.1
1    b    A  10.1
2    d    B   NaN
3    e    B  12.3
4    f    B  12.3
5    g    C  14.1
6    h    C  14.1
7    i    C  14.1

EDIT: It appears the method presented by @EdChum is the fastest by far. 编辑：看来@EdChum提出的方法是迄今为止最快的。

%timeit pd.concat((df.loc[df.Col2 == letter, :].ffill() for letter in df.Col2.unique()))
100 loops, best of 3: 3.57 ms per loop

%timeit df.groupby('Col2').transform('fillna',method='ffill')['Col3']
100 loops, best of 3: 4.59 ms per loop

%timeit df.groupby('Col2')['Col3'].transform(lambda x: x.ffill())
1000 loops, best of 3: 746 µs per loop

根据列的值在DataFrame中填充NaN

问题描述

4 个解决方案

解决方案1
2 已采纳 2015-07-15 20:16:21

解决方案2
1 2015-07-15 20:13:31

解决方案3
0 2015-07-15 20:08:45

解决方案4
0 2015-07-15 20:10:10

根据列的值在DataFrame中填充NaN

问题描述

4 个解决方案

解决方案1 2 已采纳 2015-07-15 20:16:21

解决方案2 1 2015-07-15 20:13:31

解决方案3 0 2015-07-15 20:08:45

解决方案4 0 2015-07-15 20:10:10

解决方案1
2 已采纳 2015-07-15 20:16:21

解决方案2
1 2015-07-15 20:13:31

解决方案3
0 2015-07-15 20:08:45

解决方案4
0 2015-07-15 20:10:10