简体   繁体   English

在 Pandas/Python 中以最有效的方式根据条件复制列的最后看到的非空值

[英]Copy the last seen non empty value of a column based on a condition in most efficient way in Pandas/Python

I need to copy and paste the previos non-empty value of a column based on a condition.我需要根据条件复制并粘贴列的 previos 非空值。 I need to do it in the most efficient way because the number of rows is a couple of millions.我需要以最有效的方式来做,因为行数是几百万。 Using for loop will be computationally costly.使用 for 循环的计算成本很高。

So it will be highly appreciated if somebody can help me in this regard.因此,如果有人可以在这方面帮助我,我们将不胜感激。

|Col_A   |Col_B   |
|--------|--------|
|10.2.6.1| NaN    |
|  NaN   | 51     |
|  NaN   | NaN    |
|10.2.6.1| NaN    |
|  NaN   | 64     |
|  NaN   | NaN    |
|  NaN   | NaN    |
|10.2.6.1| NaN    |

Based on the condition, whenever the Col_A will have any value (not null) 10.2.6.1 in this example, the last seen value in Col_B (51,61 respectively) will be paste on that corresponding row where the Col_A value is not null.根据条件,在本例中,只要 Col_A 具有任何值(非空)10.2.6.1,Col_B 中最后看到的值(分别为 51,61)将粘贴到 Col_A 值不是 null 的相应行上。 And the dataset should look like this:数据集应如下所示:

|Col_A   |Col_B   |
|--------|--------|
|10.2.6.1| NaN    |
|  NaN   | 51     |
|  NaN   | NaN    |
|10.2.6.1| 51     |
|  NaN   | 64     |
|  NaN   | NaN    |
|  NaN   | NaN    |
|10.2.6.1| 64     |

I tried with this code below but it's not working:我尝试使用下面的代码,但它不起作用:

df.loc[df["Col_A"].notnull(),'Col_B'] = df.loc[df["Col_B"].notnull(),'Col_B']

You can forward-fill the NaN values using ffill with the most recent non-NaN value.您可以使用ffill使用最新的非 NaN 值向前填充 NaN 值。

If you want to keep the NaNs in Col_B then simply create a new column ( Col_C ) as follows:如果要将 NaN 保留在Col_B中,则只需创建一个新列 ( Col_C ),如下所示:

df['Col_C'] = df['Col_B'].ffill()

Then replace the value in Col_B where Col_A has a value:然后替换Col_BCol_A有值的值:

df.loc[df['Col_A'].notnull(), 'Col_B'] = df.loc[df['Col_A'].notnull(), 'Col_C']
df = df.drop(columns=['Col_C'])

Result:结果:

       Col_A    Col_B
0   10.2.6.1      NaN
1        NaN     51.0
2        NaN      NaN
3   10.2.6.1     51.0
4        NaN     64.0
5        NaN      NaN
6        NaN      NaN
7   10.2.6.1     64.0

The above can be simplified if you do not need to keep all NaN rows.如果您不需要保留所有 NaN 行,则可以简化上述操作。 For example, it's possible to do:例如,可以这样做:

df['Col_B'] = df['Col_B'].ffill()
df = df.dropna()

Result:结果:

       Col_A    Col_B
3   10.2.6.1     51.0
7   10.2.6.1     64.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM