简体   繁体   中英

Filter Values in Python of a Pandas Dataframe

I have a dataset that I need to filter once a value has been exceeded but not after. Here is an example of the dataframe:

    Dip    MD
0   70      5000
1   80      6000
2   90      7000
3   80      8000

I want to filter out everything before Dip goes above 85 the first time so the resultant array should look like this:

    Dip     MD
0   90      7000
1   80      8000

Maybe using cummax

In [71]: df = pd.DataFrame({'Dip': [70, 80, 90, 80], 
    ...:     'MD': [5000, 6000, 7000, 8000]})         

In [72]: df[df.Dip.gt(85).cummax()]                   
Out[72]: 
   Dip    MD
2   90  7000
3   80  8000

You can first find the positional index of the first value satisfying a condition:

idx = next(iter(np.where(df['Dip'] > 85)[0]), df.shape[0])

Then slice your dataframe by integer position from this value onwards:

res = df.iloc[idx:]

Choosing df.shape[0] as the default if your condition is never satisfied ensures the entire dataframe is returned in this scenario.

Performance note

For larger data sets, you may find integer indexing more efficient than Boolean indexing:

np.random.seed(0)

df = pd.DataFrame({'A': np.random.randint(0, 100, 10**6)})

%timeit df[df['A'].gt(90).cummax()]                                   # 36.1 ms
%timeit df.iloc[next(iter(np.where(df['A'] > 90)[0]), df.shape[0]):]  # 4.04 ms

If efficiency is a primary concern, see Efficiently return the index of the first value satisfying condition in array . The idea is you don't have to traverse your entire series if the condition is satisfied earlier.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM