简体   繁体   中英

Finding first repeated consecutive entries in pandas dataframe

I have a dataframe of two columns Stock and DueDate , where I need to select first row from the repeated consecutive entries based on stock column.

df:

df

I am expecting output like below,

Expected output:

输出

My Approach

The approach I tried to use is to first list out what all rows repeating based on stock column by creating a new column repeated_yes and then subset the first row only if any rows are repeating more than twice.

I have used the below line of code to create new column "repeated_yes",

    ss = df.Stock.ne(df.Stock.shift())
    df['repeated_yes'] = ss.groupby(ss.cumsum()).cumcount() + 1 

so the new updated dataframe looks like this,

df_new

在此处输入图片说明

But I am stuck on subsetting only row number 3 and 8 inorder to attain the result. If there are any other effective approach it would be helpful.

Edited: Forgot to include the actual full question, If there are any other rows below the last row in the dataframe df it should not display any output.

Chain another mask created by Series.duplicated with keep=False by & for bitwise AND and filter in boolean indexing :

ss = df.Stock.ne(df.Stock.shift())
ss1 = ss.cumsum().duplicated(keep=False)

df = df[ss & ss1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM