与熊猫一起获取副本和索引

Question

I have a dataframe that I created using the "duplicated" function, it looks like this: 我有一个使用“重复”功能创建的数据框，看起来像这样：

IX  Campaign_Response   Gender  Presence_of_Child   Marital_Status  Age_Group_ID    Cluster Income_Group    Payer_Type  Race    dwell_type  education   Region  is_duplicated
 7         0               0              1                1             1              18        D                 NK  W           S           2           3   True
27          0              0              1                1             2              13        E                 PK  W           S           5             4 True
43          0              0              1                 1            2              8         H                  NK H            S           5           3  True
The rest of these lines are spaced as above Roughly
80  1   0   1   1   4   7   F   NK  H   S   1   3   True
81  1   0   1   1   4   7   F   NK  H   S   1   3   True
82  1   0   1   1   4   7   F   NK  H   S   1   3   True

So what I want is to find the index numbers f the duplicated rows (with an instance of the row? So I want to be able to see instances of row duplication and the row contents to see what the characteristics of the duplicated rows. 所以我想要的是找到重复行的索引号（带有行的实例？）所以我希望能够看到行重复的实例和行的内容，以查看重复行的特征。

I was thinking of some group by but that wiped out the index number and I also need to see the Campaign response which is not included in the "find duplicates" function, I expect that a number of otherwise identical records have differing responses and of course different index numbers... 我当时想的是分组依据，但它消除了索引号，我还需要查看“查找重复项”功能中未包含的Campaign响应，我希望许多其他相同的记录具有不同的响应，当然不同的索引号...

So desired output could look like: Any alternative way of showing is fine 因此所需的输出看起来像：任何其他显示方式都可以

80  1   0   1   1   4   7   F   NK  H   S   1   3   True
81  1   0   1   1   4   7   F   NK  H   S   1   3   True *** <<< indicating dupe of prior record (as many occurrences as required
82  1   0   1   1   4   7   F   NK  H   S   1   3   True
391  1   0   1   1   4   7   F   NK  H   S   1   3   True****
508  1   0   1   1   4   7   F   NK  H   S   1   3   True****
83  1   0   1   1   4   7   F   NK  H   S   1   3   True
108  1   0   1   1   4   7   F   NK  H   S   1   3   True *** another dupe

Answer 1

假设您的DataFrame名为df ，则可以简单地获取重复项的索引值，如下所示：

idx_dups = df[df.duplicated()].index

与熊猫一起获取副本和索引

问题描述

1 个解决方案

解决方案1
0 2015-08-14 03:56:06

与熊猫一起获取副本和索引

问题描述

1 个解决方案

解决方案1 0 2015-08-14 03:56:06

解决方案1
0 2015-08-14 03:56:06