在熊貓的列中查找具有較高值的特定行之后的第一行

Question

我有一個像這樣的熊貓數據框：

    first   second   third
0     2       2      False
1     3       1      True
2     1       4      False
3     0       6      False
4     5       7      True
5     4       2      False
6     3       4      False
7     3       6      True

它可以用以下代碼創建：

import pandas as pd

df = pd.DataFrame(
    {
        'first': [2, 3, 1, 0, 5, 4, 3, 3], 
        'second': [2, 1, 4, 6, 7, 2, 4, 6], 
        'third': [False, True, False, False, True, False, False, True]
    }
)

對於第三列中具有True值的任何行，我想在下一行中找到第二列中的值大於第一列中的值的第一行。

所以輸出應該是：

    first   second   third
2     1       4      False
6     3       4      False

而且我的首要任務是不使用任何 for 循環。

你對此有什么想法嗎？

Answer 1

你可以試試

m = df['third'].cumsum()

out = (df[m.gt(0) & (~df['third'])] # filter out heading False row and the middle True row
       .groupby(m, as_index=False)
       # select the first row that value in the second column greater than in the first column
       .apply(lambda g: g[g['second'].gt(g['first'])].iloc[:1]))

print(out)

   first  second  third
0      1       4  False
1      3       4  False

Answer 2

一種方法，使用numpy.searchsorted （用於性能）：

import numpy as np

# find all indices where first < second
m = df.index[df['first'] < df['second']]

# find all indices, but the last, where third is True
r = df.index[:-1][df.iloc[:-1]['third']]

# use searchsorted to find in O(logn) the next row where first < second
res = df.iloc[m[np.searchsorted(m, r, side="right")]]
print(res)

輸出

   first  second  third
2      1       4  False
6      3       4  False

在熊貓的列中查找具有較高值的特定行之后的第一行

問題描述

2 個解決方案

解決方案1
1 已采納 2022-07-19 21:49:51

解決方案2
1 2022-07-19 22:13:12

在熊貓的列中查找具有較高值的​​特定行之后的第一行

問題描述

2 個解決方案

解決方案1 1 已采納 2022-07-19 21:49:51

解決方案2 1 2022-07-19 22:13:12

在熊貓的列中查找具有較高值的特定行之后的第一行

解決方案1
1 已采納 2022-07-19 21:49:51

解決方案2
1 2022-07-19 22:13:12