简体   繁体   中英

Drop only specific consequtive duplicates in a pandas dataframe

I have the following dataframe, from which I need to drop consecutive duplicate values only if they equal 0.3 or 0.4.

In [2]: df = pd.DataFrame(index=pd.date_range('20020101', periods=7, freq='D'),
                              data={'poll_support': [0.3, 0.4, 0.4, 0.4, 0.3 0.5 0.5]})
    
In [3]: df
Out[3]:
                poll_support
2002-01-01           0.3
2002-01-02           0.4
2002-01-03           0.4
2002-01-04           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

I need the df to look like this:

2002-01-01           0.3
2002-01-02           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

I tried:

for var in df['poll_support']:
    if var == 0.3 or var == 0.4:
        df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.3]
        df['poll_support']= df['poll_support'].loc[df['poll_support'].shift() != 0.4]

However, this does not produce the desired df.

I would love to hear suggestions.

Boolean indexing will help. Try:

df[~((df['poll_support']==df['poll_support'].shift())&(df['poll_support'].isin([0.3,0.4])))]




             poll_support
2002-01-01           0.3
2002-01-02           0.4
2002-01-05           0.3
2002-01-06           0.5
2002-01-07           0.5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM