简体   繁体   中英

Pandas filter dataframe based on condition for the first n rows

I have a dataframe of shape [600 000, 19]. I want to filter the first 100 000 rows based on one condition, the next 300 000 based on another condition, and a 3rd condition for the last rows. I was wondering how this can be done.

Currently, I split the data frame into 3 segments and apply their respective conditions. Then, I re-concatenate the data frame. Is there a better way?

Example: Filter first 100 000 rows based on any value less than 5. For second 300 000 rows, I dont want any values greater than 40, etc.

You can try the following approach:

import pandas as pd

sample = pd.DataFrame({'x' : pd.np.arange(100),
                       'colname': pd.np.arange(100)})
conditions = [('index < 5', 'colname < 3'), 
              ('index > 50', 'index < 100', 'colname < 55')]
sample.query('|'.join(map(lambda x: '&'.join(x), conditions)))

On approach would be to use dataframe index slicing with pd.concat to build complete boolean series:

import numpy as np
import pandas as pd
np.random.seed(0)
df=pd.DataFrame(np.random.randint(0,50,60))

df[pd.concat([df.iloc[:10] > 10, df[11:40] < 30, df[41:] % 2 == 0])]

Where first 10 records filters less than 10, next 30 values filters greater than 30, and last values check for even numbers.

Then you can use dropna to remove all the NaN values

Output:

      0
0   44.0
1   47.0
2    NaN
3    NaN
4    NaN
5   39.0
6    NaN
7   19.0
8   21.0
9   36.0
10   NaN
11   6.0
12  24.0
13  24.0
14  12.0
15   1.0
16   NaN
17   NaN
18  23.0
19   NaN
20  24.0
21  17.0
22   NaN
23  25.0
24  13.0
25   8.0
26   9.0
27  20.0
28  16.0
29   5.0
30  15.0
31   NaN
32   0.0
33  18.0
34   NaN
35  24.0
36   NaN
37  29.0
38  19.0
39  19.0
40   NaN
41   NaN
42  32.0
43   NaN
44   NaN
45  32.0
46   NaN
47  10.0
48   NaN
49   NaN
50   NaN
51  28.0
52  34.0
53   0.0
54   0.0
55  36.0
56   NaN
57  38.0
58  40.0
59   NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM