简体   繁体   中英

a solution for filtering some rows of data based on condition in pandas

I have the following example data, and I'd like to filter a piece of data, when (col1 = 'A' and col2 = '0') we want to keep rows until next (col1 = 'A') .
I want to do using pandas dataframe but I don't know how it is.

df = pd.DataFrame({'col1': ['A', 'B', 'C'],  'col2': [0, 1]}) 

For example, we have this data

col1 col2
 A    0
 C
 A    1 
 B
 C
 A    1 
 B
 B
 C
 A    0 
 B 
 C
 A    1 
 B 
 C
 C 

The result I want to achieve is:

col1 col2 
 A    0 
 C 
 A    0 
 B 
 C 

Thank you very much

We first groupby row blocks starting with 'A' and then propagate the first value of col2 to all rows of the group. From this result we take all rows with 0 in col2 .

 df[df.groupby(df.col1.eq('A').cumsum()).col2.transform('first').eq(0)]

Sample data:

df = pd.DataFrame({'col1': list('ACABCABBCABCABCC'),
                   'col2': [0, None, 1, None, None, 1, None, None, None, 0, None, None, 1, None, None, None]}
                 ).astype({'col2': 'Int32'})

Result:

   col1  col2
0     A     0
1     C  <NA>
9     A     0
10    B  <NA>
11    C  <NA>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM