簡體   English   中英

在熊貓的列中查找具有較高值的​​特定行之后的第一行

[英]Find first row after a specific row with higher value in a column in pandas

我有一個像這樣的熊貓數據框:

    first   second   third
0     2       2      False
1     3       1      True
2     1       4      False
3     0       6      False
4     5       7      True
5     4       2      False
6     3       4      False
7     3       6      True

它可以用以下代碼創建:

import pandas as pd

df = pd.DataFrame(
    {
        'first': [2, 3, 1, 0, 5, 4, 3, 3], 
        'second': [2, 1, 4, 6, 7, 2, 4, 6], 
        'third': [False, True, False, False, True, False, False, True]
    }
)

對於第三列中具有True值的任何行,我想在下一行中找到第二列中的值大於第一列中的值的第一行。

所以輸出應該是:

    first   second   third
2     1       4      False
6     3       4      False

而且我的首要任務是不使用任何 for 循環。

你對此有什么想法嗎?

你可以試試

m = df['third'].cumsum()

out = (df[m.gt(0) & (~df['third'])] # filter out heading False row and the middle True row
       .groupby(m, as_index=False)
       # select the first row that value in the second column greater than in the first column
       .apply(lambda g: g[g['second'].gt(g['first'])].iloc[:1]))
print(out)

   first  second  third
0      1       4  False
1      3       4  False

一種方法,使用numpy.searchsorted (用於性能):

import numpy as np

# find all indices where first < second
m = df.index[df['first'] < df['second']]

# find all indices, but the last, where third is True
r = df.index[:-1][df.iloc[:-1]['third']]

# use searchsorted to find in O(logn) the next row where first < second
res = df.iloc[m[np.searchsorted(m, r, side="right")]]
print(res)

輸出

   first  second  third
2      1       4  False
6      3       4  False

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM