简体   繁体   中英

Remove rows in a pandas dataframe between two specific values

I'm trying to remove rows in a pandas dataframe, in a way that everything between two specific values (eg, start and end ) is deleted, including the two values. These values can repeat, as in:

c1 c2
1 1
2 start
3 1
4 0
5 end
6 1
7 start
8 1
9 0
10 end
11 1

So the desired output would be:

c1 c2
1 1
6 1
11 1

I recreated a similar dataframe like yours. This is not an efficient way to do it, but it can work.

df1 :

   c1     c2
0   1      1
1   2  start
2   3      3
3   4    end
4   5      5
5   6  start
6   7    end
7   8      0

code:

import pandas as pd
import copy

df = pd.DataFrame({'c1': [1, 2, 3, 4,5,6,7,8], 'c2': ['1', 'start', '3', 'end','5','start','end',0]})
df2 = copy.copy(df)
flag = False
for i, j in df.iterrows():
    if j['c2'] == 'start':
        flag = True
        df2 = df2.drop(df.index[[i]])
    elif j['c2'] =='end':
        flag = False
        df2 = df2.drop(df.index[[i]])
    elif flag:
        df2 = df2.drop(df.index[[i]])

output df2 :

   c1 c2
0   1  1
4   5  5
7   8  0

You can use masks

mask1 = df.c2.shift(-1) == "start"                                                    
mask2 = df.c2.shift(1) == "end"                                                       
newDf = (df.loc[mask1 | mask2]).reset_index(drop=True)

Output

   c1 c2
0   1  1
1   5  5
2   8  0

Starting from the answer from Tamil above, this is how I managed to implement it in my dataframe. It should be more efficient since it uses itertuples and not iterrows.

df = pd.DataFrame({'c1': [1, 2, 3, 4,5,6,7,8], 'c2': ['1', 'start', '3', 'end','5','start','end',0]})
df2 = copy.copy(df)
flag = False
list_a = []

for j in df.itertuples():
    if j.c2 == 'start':
        flag = True
        list_a.append((j))
    elif j.c2 =='end':
        flag = False
        list_a.append((j))
    elif flag:
        list_a.append((j))

list_a = tuple(list_a) 
to_remove_df = pd.DataFrame(list_a, columns=['index','c1','c2'])
to_remove_df = to_remove_df["c2"]
removed_df = pd.merge(df, to_remove_df, on=["c2"], how="outer", indicator=True).query('_merge != "both"').drop('_merge', 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM