[英]Find first row after a specific row with higher value in a column in pandas
我有一個像這樣的熊貓數據框:
first second third
0 2 2 False
1 3 1 True
2 1 4 False
3 0 6 False
4 5 7 True
5 4 2 False
6 3 4 False
7 3 6 True
它可以用以下代碼創建:
import pandas as pd
df = pd.DataFrame(
{
'first': [2, 3, 1, 0, 5, 4, 3, 3],
'second': [2, 1, 4, 6, 7, 2, 4, 6],
'third': [False, True, False, False, True, False, False, True]
}
)
對於第三列中具有True
值的任何行,我想在下一行中找到第二列中的值大於第一列中的值的第一行。
所以輸出應該是:
first second third
2 1 4 False
6 3 4 False
而且我的首要任務是不使用任何 for 循環。
你對此有什么想法嗎?
你可以試試
m = df['third'].cumsum()
out = (df[m.gt(0) & (~df['third'])] # filter out heading False row and the middle True row
.groupby(m, as_index=False)
# select the first row that value in the second column greater than in the first column
.apply(lambda g: g[g['second'].gt(g['first'])].iloc[:1]))
print(out)
first second third
0 1 4 False
1 3 4 False
一種方法,使用numpy.searchsorted
(用於性能):
import numpy as np
# find all indices where first < second
m = df.index[df['first'] < df['second']]
# find all indices, but the last, where third is True
r = df.index[:-1][df.iloc[:-1]['third']]
# use searchsorted to find in O(logn) the next row where first < second
res = df.iloc[m[np.searchsorted(m, r, side="right")]]
print(res)
輸出
first second third
2 1 4 False
6 3 4 False
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.