简体   繁体   English

我如何计算重复的行数?

[英]How can i count the number of rows duplicated?

Hello I am working with a DF, and I have the following question:您好,我正在与 DF 合作,我有以下问题:

How can I check the number of duplicates like these:我如何检查这些重复的数量:

A    B    C
1    2    3
1    2    3
1    1    2
2    1    2
2    2    1
3    2    1

Who can I count for examples that A duplicate is 2 because I have 3 rows duplicated with 1 and 2 rows duplicated with 2.我可以算出 A 副本为 2 的示例,因为我有 3 行与 1 重复,2 行与 2 重复。

And how could I count that A duplicate is 1 because there is only one time that 2 rows are identical as you can see in 1 2 3我怎么能算 A duplicate 是 1 因为只有一次 2 行是相同的,正如您在1 2 3中看到的那样

Thanks谢谢

df.groupby(['A','B','C']).size()

I think this will help you to solve your problem我认为这将帮助您解决问题

from pandas import DataFrame

if __name__ == '__main__':
    d = {'A': [1, 1, 1, 2, 2, 3],
         'B': [2, 2, 1, 1, 2, 2],
         'C': [3, 3, 2, 2, 1, 1]}

    df = DataFrame(d)
    duplicated_rows = df[df.duplicated()]
    print(duplicated_rows)

Output: Output:

   A  B  C
1  1  2  3

I have understood you need duplicates per column.我知道您需要每列重复。 If so, use boolean selection to identify the first duplicate.如果是这样,请使用 boolean 选择来识别第一个重复项。 cumsum() to get groups and get maximum in the group. cumsum() 获取组并在组中获得最大值。

df.apply(lambda x: ((x==x.shift(-1))&(x.diff()!=0)).cumsum().max())

A    2
B    3
C    3

If you wanted duplicates along the rows, find duplicated, converst to integer and sum如果您想要沿行重复,找到重复的,转换为 integer 并求和

((df.apply(lambda x: x.duplicated(False),axis=1)).astype(int)).sum(axis=1)
0    0
1    0
2    4
3    3
4    3
5    0

For your second part of the question, do what @Cody Gray did as follows对于问题的第二部分,请执行@Cody Gray 的操作,如下所示

df.groupby(['A', 'B', 'C']).agg(lambda x: x.duplicated(keep='last').count())

A  B  C
1  1  2    1
   2  3    2
2  1  2    1
   2  1    1
3  2  1    1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM