[英]Copy the last seen non empty value of a column based on a condition in most efficient way in Pandas/Python
I need to copy and paste the previos non-empty value of a column based on a condition.我需要根据条件复制并粘贴列的 previos 非空值。 I need to do it in the most efficient way because the number of rows is a couple of millions.
我需要以最有效的方式来做,因为行数是几百万。 Using for loop will be computationally costly.
使用 for 循环的计算成本很高。
So it will be highly appreciated if somebody can help me in this regard.因此,如果有人可以在这方面帮助我,我们将不胜感激。
|Col_A |Col_B |
|--------|--------|
|10.2.6.1| NaN |
| NaN | 51 |
| NaN | NaN |
|10.2.6.1| NaN |
| NaN | 64 |
| NaN | NaN |
| NaN | NaN |
|10.2.6.1| NaN |
Based on the condition, whenever the Col_A will have any value (not null) 10.2.6.1 in this example, the last seen value in Col_B (51,61 respectively) will be paste on that corresponding row where the Col_A value is not null.根据条件,在本例中,只要 Col_A 具有任何值(非空)10.2.6.1,Col_B 中最后看到的值(分别为 51,61)将粘贴到 Col_A 值不是 null 的相应行上。 And the dataset should look like this:
数据集应如下所示:
|Col_A |Col_B |
|--------|--------|
|10.2.6.1| NaN |
| NaN | 51 |
| NaN | NaN |
|10.2.6.1| 51 |
| NaN | 64 |
| NaN | NaN |
| NaN | NaN |
|10.2.6.1| 64 |
I tried with this code below but it's not working:我尝试使用下面的代码,但它不起作用:
df.loc[df["Col_A"].notnull(),'Col_B'] = df.loc[df["Col_B"].notnull(),'Col_B']
You can forward-fill the NaN values using ffill
with the most recent non-NaN value.您可以使用
ffill
使用最新的非 NaN 值向前填充 NaN 值。
If you want to keep the NaNs in Col_B
then simply create a new column ( Col_C
) as follows:如果要将 NaN 保留在
Col_B
中,则只需创建一个新列 ( Col_C
),如下所示:
df['Col_C'] = df['Col_B'].ffill()
Then replace the value in Col_B
where Col_A
has a value:然后替换
Col_B
中Col_A
有值的值:
df.loc[df['Col_A'].notnull(), 'Col_B'] = df.loc[df['Col_A'].notnull(), 'Col_C']
df = df.drop(columns=['Col_C'])
Result:结果:
Col_A Col_B
0 10.2.6.1 NaN
1 NaN 51.0
2 NaN NaN
3 10.2.6.1 51.0
4 NaN 64.0
5 NaN NaN
6 NaN NaN
7 10.2.6.1 64.0
The above can be simplified if you do not need to keep all NaN rows.如果您不需要保留所有 NaN 行,则可以简化上述操作。 For example, it's possible to do:
例如,可以这样做:
df['Col_B'] = df['Col_B'].ffill()
df = df.dropna()
Result:结果:
Col_A Col_B
3 10.2.6.1 51.0
7 10.2.6.1 64.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.