[英]Get duplicates together with index with pandas
I have a dataframe that I created using the "duplicated" function, it looks like this: 我有一个使用“重复”功能创建的数据框,看起来像这样:
IX Campaign_Response Gender Presence_of_Child Marital_Status Age_Group_ID Cluster Income_Group Payer_Type Race dwell_type education Region is_duplicated
7 0 0 1 1 1 18 D NK W S 2 3 True
27 0 0 1 1 2 13 E PK W S 5 4 True
43 0 0 1 1 2 8 H NK H S 5 3 True
The rest of these lines are spaced as above Roughly
80 1 0 1 1 4 7 F NK H S 1 3 True
81 1 0 1 1 4 7 F NK H S 1 3 True
82 1 0 1 1 4 7 F NK H S 1 3 True
So what I want is to find the index numbers f the duplicated rows (with an instance of the row? So I want to be able to see instances of row duplication and the row contents to see what the characteristics of the duplicated rows. 所以我想要的是找到重复行的索引号(带有行的实例?)所以我希望能够看到行重复的实例和行的内容,以查看重复行的特征。
I was thinking of some group by but that wiped out the index number and I also need to see the Campaign response which is not included in the "find duplicates" function, I expect that a number of otherwise identical records have differing responses and of course different index numbers... 我当时想的是分组依据,但它消除了索引号,我还需要查看“查找重复项”功能中未包含的Campaign响应,我希望许多其他相同的记录具有不同的响应,当然不同的索引号...
So desired output could look like: Any alternative way of showing is fine 因此所需的输出看起来像:任何其他显示方式都可以
80 1 0 1 1 4 7 F NK H S 1 3 True
81 1 0 1 1 4 7 F NK H S 1 3 True *** <<< indicating dupe of prior record (as many occurrences as required
82 1 0 1 1 4 7 F NK H S 1 3 True
391 1 0 1 1 4 7 F NK H S 1 3 True****
508 1 0 1 1 4 7 F NK H S 1 3 True****
83 1 0 1 1 4 7 F NK H S 1 3 True
108 1 0 1 1 4 7 F NK H S 1 3 True *** another dupe
假设您的DataFrame名为df
,则可以简单地获取重复项的索引值,如下所示:
idx_dups = df[df.duplicated()].index
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.