我如何计算重复的行数？

Question

Hello I am working with a DF, and I have the following question:您好，我正在与 DF 合作，我有以下问题：

How can I check the number of duplicates like these:我如何检查这些重复的数量：

A    B    C
1    2    3
1    2    3
1    1    2
2    1    2
2    2    1
3    2    1

Who can I count for examples that A duplicate is 2 because I have 3 rows duplicated with 1 and 2 rows duplicated with 2.我可以算出 A 副本为 2 的示例，因为我有 3 行与 1 重复，2 行与 2 重复。

And how could I count that A duplicate is 1 because there is only one time that 2 rows are identical as you can see in 1 2 3我怎么能算 A duplicate 是 1 因为只有一次 2 行是相同的，正如您在1 2 3中看到的那样

Thanks谢谢

Answer 1

df.groupby(['A','B','C']).size()

Answer 2

I think this will help you to solve your problem我认为这将帮助您解决问题

from pandas import DataFrame

if __name__ == '__main__':
    d = {'A': [1, 1, 1, 2, 2, 3],
         'B': [2, 2, 1, 1, 2, 2],
         'C': [3, 3, 2, 2, 1, 1]}

    df = DataFrame(d)
    duplicated_rows = df[df.duplicated()]
    print(duplicated_rows)

Output: Output：

   A  B  C
1  1  2  3

Answer 3

I have understood you need duplicates per column.我知道您需要每列重复。 If so, use boolean selection to identify the first duplicate.如果是这样，请使用 boolean 选择来识别第一个重复项。 cumsum() to get groups and get maximum in the group. cumsum() 获取组并在组中获得最大值。

df.apply(lambda x: ((x==x.shift(-1))&(x.diff()!=0)).cumsum().max())

A    2
B    3
C    3

If you wanted duplicates along the rows, find duplicated, converst to integer and sum如果您想要沿行重复，找到重复的，转换为 integer 并求和

((df.apply(lambda x: x.duplicated(False),axis=1)).astype(int)).sum(axis=1)
0    0
1    0
2    4
3    3
4    3
5    0

For your second part of the question, do what @Cody Gray did as follows对于问题的第二部分，请执行@Cody Gray 的操作，如下所示

df.groupby(['A', 'B', 'C']).agg(lambda x: x.duplicated(keep='last').count())

A  B  C
1  1  2    1
   2  3    2
2  1  2    1
   2  1    1
3  2  1    1

我如何计算重复的行数？

问题描述

3 个解决方案

解决方案1
3 2020-07-30 09:51:44

解决方案2
2 2020-07-30 10:10:25

解决方案3
2 已采纳 2020-07-30 11:09:35

我如何计算重复的行数？

问题描述

3 个解决方案

解决方案1 3 2020-07-30 09:51:44

解决方案2 2 2020-07-30 10:10:25

解决方案3 2 已采纳 2020-07-30 11:09:35

解决方案1
3 2020-07-30 09:51:44

解决方案2
2 2020-07-30 10:10:25

解决方案3
2 已采纳 2020-07-30 11:09:35