简体   繁体   English

与熊猫一起获取副本和索引

[英]Get duplicates together with index with pandas

I have a dataframe that I created using the "duplicated" function, it looks like this: 我有一个使用“重复”功能创建的数据框,看起来像这样:

IX  Campaign_Response   Gender  Presence_of_Child   Marital_Status  Age_Group_ID    Cluster Income_Group    Payer_Type  Race    dwell_type  education   Region  is_duplicated
 7         0               0              1                1             1              18        D                 NK  W           S           2           3   True
27          0              0              1                1             2              13        E                 PK  W           S           5             4 True
43          0              0              1                 1            2              8         H                  NK H            S           5           3  True
The rest of these lines are spaced as above Roughly
80  1   0   1   1   4   7   F   NK  H   S   1   3   True
81  1   0   1   1   4   7   F   NK  H   S   1   3   True
82  1   0   1   1   4   7   F   NK  H   S   1   3   True

So what I want is to find the index numbers f the duplicated rows (with an instance of the row? So I want to be able to see instances of row duplication and the row contents to see what the characteristics of the duplicated rows. 所以我想要的是找到重复行的索引号(带有行的实例?)所以我希望能够看到行重复的实例和行的内容,以查看重复行的特征。

I was thinking of some group by but that wiped out the index number and I also need to see the Campaign response which is not included in the "find duplicates" function, I expect that a number of otherwise identical records have differing responses and of course different index numbers... 我当时想的是分组依据,但它消除了索引号,我还需要查看“查找重复项”功能中未包含的Campaign响应,我希望许多其他相同的记录具有不同的响应,当然不同的索引号...

So desired output could look like: Any alternative way of showing is fine 因此所需的输出看起来像:任何其他显示方式都可以

80  1   0   1   1   4   7   F   NK  H   S   1   3   True
81  1   0   1   1   4   7   F   NK  H   S   1   3   True *** <<< indicating dupe of prior record (as many occurrences as required
82  1   0   1   1   4   7   F   NK  H   S   1   3   True
391  1   0   1   1   4   7   F   NK  H   S   1   3   True****
508  1   0   1   1   4   7   F   NK  H   S   1   3   True****
83  1   0   1   1   4   7   F   NK  H   S   1   3   True
108  1   0   1   1   4   7   F   NK  H   S   1   3   True *** another dupe

假设您的DataFrame名为df ,则可以简单地获取重复项的索引值,如下所示:

idx_dups = df[df.duplicated()].index

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM