在 Pandas/Python 中以最有效的方式根据条件复制列的最后看到的非空值

Question

I need to copy and paste the previos non-empty value of a column based on a condition.我需要根据条件复制并粘贴列的 previos 非空值。 I need to do it in the most efficient way because the number of rows is a couple of millions.我需要以最有效的方式来做，因为行数是几百万。 Using for loop will be computationally costly.使用 for 循环的计算成本很高。

So it will be highly appreciated if somebody can help me in this regard.因此，如果有人可以在这方面帮助我，我们将不胜感激。

|Col_A   |Col_B   |
|--------|--------|
|10.2.6.1| NaN    |
|  NaN   | 51     |
|  NaN   | NaN    |
|10.2.6.1| NaN    |
|  NaN   | 64     |
|  NaN   | NaN    |
|  NaN   | NaN    |
|10.2.6.1| NaN    |

Based on the condition, whenever the Col_A will have any value (not null) 10.2.6.1 in this example, the last seen value in Col_B (51,61 respectively) will be paste on that corresponding row where the Col_A value is not null.根据条件，在本例中，只要 Col_A 具有任何值（非空）10.2.6.1，Col_B 中最后看到的值（分别为 51,61）将粘贴到 Col_A 值不是 null 的相应行上。 And the dataset should look like this:数据集应如下所示：

|Col_A   |Col_B   |
|--------|--------|
|10.2.6.1| NaN    |
|  NaN   | 51     |
|  NaN   | NaN    |
|10.2.6.1| 51     |
|  NaN   | 64     |
|  NaN   | NaN    |
|  NaN   | NaN    |
|10.2.6.1| 64     |

I tried with this code below but it's not working:我尝试使用下面的代码，但它不起作用：

df.loc[df["Col_A"].notnull(),'Col_B'] = df.loc[df["Col_B"].notnull(),'Col_B']

Answer 1

You can forward-fill the NaN values using ffill with the most recent non-NaN value.您可以使用ffill使用最新的非 NaN 值向前填充 NaN 值。

If you want to keep the NaNs in Col_B then simply create a new column ( Col_C ) as follows:如果要将 NaN 保留在Col_B中，则只需创建一个新列 ( Col_C )，如下所示：

df['Col_C'] = df['Col_B'].ffill()

Then replace the value in Col_B where Col_A has a value:然后替换Col_B中Col_A有值的值：

df.loc[df['Col_A'].notnull(), 'Col_B'] = df.loc[df['Col_A'].notnull(), 'Col_C']
df = df.drop(columns=['Col_C'])

Result:结果：

       Col_A    Col_B
0   10.2.6.1      NaN
1        NaN     51.0
2        NaN      NaN
3   10.2.6.1     51.0
4        NaN     64.0
5        NaN      NaN
6        NaN      NaN
7   10.2.6.1     64.0

The above can be simplified if you do not need to keep all NaN rows.如果您不需要保留所有 NaN 行，则可以简化上述操作。 For example, it's possible to do:例如，可以这样做：

df['Col_B'] = df['Col_B'].ffill()
df = df.dropna()

Result:结果：

       Col_A    Col_B
3   10.2.6.1     51.0
7   10.2.6.1     64.0

在 Pandas/Python 中以最有效的方式根据条件复制列的最后看到的非空值

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-03-08 03:54:19

在 Pandas/Python 中以最有效的方式根据条件复制列的最后看到的非空值

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-03-08 03:54:19

解决方案1
1 已采纳 2021-03-08 03:54:19