在熊猫数据框中查找第一个重复的连续条目

Question

I have a dataframe of two columns Stock and DueDate , where I need to select first row from the repeated consecutive entries based on stock column.我有一个包含Stock和DueDate两列的数据框，我需要根据 stock 列从重复的连续条目中选择第一行。

df: df：

I am expecting output like below,我期待如下输出，

Expected output:预期输出：

My Approach我的方法

The approach I tried to use is to first list out what all rows repeating based on stock column by creating a new column repeated_yes and then subset the first row only if any rows are repeating more than twice.我尝试使用的方法是首先通过创建一个新列repeat_yes列出所有基于股票列重复的行，然后仅当任何行重复两次以上时才对第一行进行子集化。

I have used the below line of code to create new column "repeated_yes",我使用下面的代码行创建了新列“repeated_yes”，

    ss = df.Stock.ne(df.Stock.shift())
    df['repeated_yes'] = ss.groupby(ss.cumsum()).cumcount() + 1

so the new updated dataframe looks like this,所以新更新的数据框看起来像这样，

df_new df_new

But I am stuck on subsetting only row number 3 and 8 inorder to attain the result.但我坚持只对第3行和第8行进行子集化以获得结果。 If there are any other effective approach it would be helpful.如果有任何其他有效的方法会有所帮助。

Edited: Forgot to include the actual full question, If there are any other rows below the last row in the dataframe df it should not display any output.编辑：忘记包含实际的完整问题，如果数据帧df的最后一行下方还有任何其他行，则不应显示任何输出。

Answer 1

Chain another mask created by Series.duplicated with keep=False by & for bitwise AND and filter in boolean indexing :将Series.duplicated创建的另一个掩码与keep=False by &用于按位AND并在boolean indexing过滤：

ss = df.Stock.ne(df.Stock.shift())
ss1 = ss.cumsum().duplicated(keep=False)

df = df[ss & ss1]

在熊猫数据框中查找第一个重复的连续条目

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-11-03 11:10:01

在熊猫数据框中查找第一个重复的连续条目

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-11-03 11:10:01

解决方案1
2 已采纳 2020-11-03 11:10:01