I understand I can do something like this:
df[df['data'] > 3].index.tolist()
and take the first element of the list
but the place I need to use it is in a loop with a lot of iterations and a very large dataframe. I want to get the first instance and stop the execution right there instead of wasting time to collect all instances to then discard all results but the first one.
Is there a way to do this with Pandas? manually iterating through the rows is crazy slow; splitting the dataframe into chunks and doing a search in each doesn't help that much (possibly because it does some copies, not sure).
edit: here's an example
data = {'data': [10, 11, 12, 14, 15, 16, 18]} # this is over 1M entries in practice
df = pd.DataFrame.from_dict(data)
df.index[df['data']>14].tolist()[0]
this returns 4, as expected.
what I want is to find a fast way to stop execution the moment there is one row matching the condition.
idxmax
Still evaluates a boolean series prior to evaluating idxmax
df['data'].gt(3).idxmax()
argmax
df.index[(df['data'].to_numpy() > 3).argmax()]
def find(s):
for i, v in s.iteritems():
if v > 3:
return i
find(df['data'])
from numba import njit
@njit
def find(a, b, c):
for x, y in zip(a, b):
if y > c:
return x
find(df.index.to_numpy(), df['data'].to_numpy(), 3)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.