finding the index of the first row matching a condition in pandas

Question

I understand I can do something like this:

df[df['data'] > 3].index.tolist()

and take the first element of the list

but the place I need to use it is in a loop with a lot of iterations and a very large dataframe. I want to get the first instance and stop the execution right there instead of wasting time to collect all instances to then discard all results but the first one.

Is there a way to do this with Pandas? manually iterating through the rows is crazy slow; splitting the dataframe into chunks and doing a search in each doesn't help that much (possibly because it does some copies, not sure).

edit: here's an example

data = {'data': [10, 11, 12, 14, 15, 16, 18]}   # this is over 1M entries in practice
df = pd.DataFrame.from_dict(data)
df.index[df['data']>14].tolist()[0]

this returns 4, as expected.

what I want is to find a fast way to stop execution the moment there is one row matching the condition.

Answer 1

`idxmax`

Still evaluates a boolean series prior to evaluating idxmax

df['data'].gt(3).idxmax()

`argmax`

df.index[(df['data'].to_numpy() > 3).argmax()]

explicit function

def find(s):
    for i, v in s.iteritems():
        if v > 3:
            return i

find(df['data'])

Numba

from numba import njit

@njit
def find(a, b, c):
    for x, y in zip(a, b):
        if y > c:
            return x

find(df.index.to_numpy(), df['data'].to_numpy(), 3)

finding the index of the first row matching a condition in pandas

Question

1 answers

solution1
3 2020-01-28 22:52:53

`idxmax`

`argmax`

explicit function

Numba

finding the index of the first row matching a condition in pandas

Question

1 answers

solution1 3 2020-01-28 22:52:53

idxmax

argmax

explicit function

Numba

solution1
3 2020-01-28 22:52:53

`idxmax`

`argmax`