I have a dataset that I need to filter once a value has been exceeded but not after. Here is an example of the dataframe:
Dip MD
0 70 5000
1 80 6000
2 90 7000
3 80 8000
I want to filter out everything before Dip goes above 85 the first time so the resultant array should look like this:
Dip MD
0 90 7000
1 80 8000
Maybe using cummax
In [71]: df = pd.DataFrame({'Dip': [70, 80, 90, 80],
...: 'MD': [5000, 6000, 7000, 8000]})
In [72]: df[df.Dip.gt(85).cummax()]
Out[72]:
Dip MD
2 90 7000
3 80 8000
You can first find the positional index of the first value satisfying a condition:
idx = next(iter(np.where(df['Dip'] > 85)[0]), df.shape[0])
Then slice your dataframe by integer position from this value onwards:
res = df.iloc[idx:]
Choosing df.shape[0]
as the default if your condition is never satisfied ensures the entire dataframe is returned in this scenario.
Performance note
For larger data sets, you may find integer indexing more efficient than Boolean indexing:
np.random.seed(0)
df = pd.DataFrame({'A': np.random.randint(0, 100, 10**6)})
%timeit df[df['A'].gt(90).cummax()] # 36.1 ms
%timeit df.iloc[next(iter(np.where(df['A'] > 90)[0]), df.shape[0]):] # 4.04 ms
If efficiency is a primary concern, see Efficiently return the index of the first value satisfying condition in array . The idea is you don't have to traverse your entire series if the condition is satisfied earlier.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.