如何为每一行从下一行中获取一个与Pandas中的条件匹配的值？

Question

假设我们有一个像下面这样的表：

现在，我想为每一行从下一行的列A取值，其中B <= 2.0。 结果存储在C中。然后我们得到：

A B   C
1 1.0 2
2 2.0 3 # Here we skip a row because next.B > 2.0
3 2.0 5 
4 3.0 5
5 2.0 6
6 1.0 7
7 1.0 Na

有没有办法在Pandas（或Numpy）中有效地实现这一目标？ 数据帧可能包含数百万行，我希望此操作最多需要几秒钟。

如果没有快速的Pandas / Numpy解决方案，我将在Numba中对其进行编码。 但是，由于某种原因，我过去对类似问题（nopython以及嵌套和中断）的Numba解决方案运行速度很慢，这就是为什么我要求一种更好的方法。

上下文：在这里，我问如何在延迟到期之前从时间序列数据帧的下一行中获取下一行的值。 这个问题是相关的，但是不使用时间/排序列，因此不能使用searchsorted 。

Answer 1

您可以按照以下几个步骤进行操作：

import pandas as pd
import numpy as np

# initialize column 'C' with the value of column 'A'
# for all rows with values for 'B' smaller than 2.0
# use np.NaN if 'C' if 'B' > 2.0
# because normal int columns do not support null values
# we use the new type Int64 instead 
# (new in pandas version 0.25)
df['C']= df['A'].astype('Int64').where(df['B']<=2.0, np.NaN)

# now just fill the gaps using the value of the next row
# in which the field is filled and shift the column
df['C'].fillna(method='bfill', inplace=True)
df['C']=df['C'].shift(-1)

结果是：

>>> df
   A    B    C
0  1  1.0    2
1  2  2.0    3
2  3  2.0    5
3  4  3.0    5
4  5  2.0    6
5  6  1.0    7
6  7  1.0  NaN

Answer 2

您只需要在B上切片df小于或等于2然后reindex ， bfill和shift

df['C'] = df.loc[df.B.le(2), 'A'].reindex(df.index).bfill().shift(-1)

Out[599]:
   A    B    C
0  1  1.0  2.0
1  2  2.0  3.0
2  3  2.0  5.0
3  4  3.0  5.0
4  5  2.0  6.0
5  6  1.0  7.0
6  7  1.0  NaN

如何为每一行从下一行中获取一个与Pandas中的条件匹配的值？

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-08-06 18:07:37

解决方案2
0 2019-08-06 18:08:10

如何为每一行从下一行中获取一个与Pandas中的条件匹配的值？

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-08-06 18:07:37

解决方案2 0 2019-08-06 18:08:10

解决方案1
2 已采纳 2019-08-06 18:07:37

解决方案2
0 2019-08-06 18:08:10