计算唯一行数 pandas

Question

I want to count the number of unique rows in a pandas dataframe and add a new row as count_index as in example.我想计算 pandas dataframe 中的唯一行数，并添加一个新行作为 count_index ，如示例所示。 In another way, I want to duplicate the index for duplicate rows.以另一种方式，我想复制重复行的索引。

import pandas as pd
df = {'A': [ 8,8,9,9,9,12,12,13,15,15,15],
      'B': [ 1,1,2,2,2,11,11,3,4,4,4],
      'C': [ 10,10,20,20,20,101,101,30,40,40,40],
      'D': [81,81,92,92,92,121,121,134,150,150,150]}
df = pd.DataFrame(df)

print(df.groupby(['A','B','C','D']).size())
#####################################################
      #input
   A    B      C     D
   8    1      10    81 
   8    1      10    81 
   9    2      20    92 
   9    2      20    92 
   9    2      20    92 
  12   11     101   121 
  12   11     101   121 
  13    3      30   134 
  15    4      40   150 
  15    4      40   150 
  15    4      40   150 
 ####################################################
#expected output
    A    B      C     D   Count_index
   8    1      10    81    1
   8    1      10    81    1
   9    2      20    92    2
   9    2      20    92    2  
   9    2      20    92    2
  12   11     101   121    3
  12   11     101   121    3
  13    3      30   134    4
  15    4      40   150    5
  15    4      40   150    5
  15    4      40   150    5

Answer 1

You can do this by counting the number of inverted .duplicated s.您可以通过计算倒置的.duplicated的数量来做到这一点。 We can then use a cumulative to keep an ongoing count of the number of encountered unique rows.然后，我们可以使用累积来保持对遇到的唯一行数的持续计数。

df['count_index'] = (~df.duplicated(keep="first")).cumsum()

print(df)
     A   B    C    D  count_index
0    8   1   10   81            1
1    8   1   10   81            1
2    9   2   20   92            2
3    9   2   20   92            2
4    9   2   20   92            2
5   12  11  101  121            3
6   12  11  101  121            3
7   13   3   30  134            4
8   15   4   40  150            5
9   15   4   40  150            5
10  15   4   40  150            5

Answer 2

You can use a combination of diff().ne(0) or df.ne(df.shift())您可以使用diff().ne(0)或df.ne(df.shift())的组合

df.diff().ne(0).all(axis=1).cumsum()

or或者

df.ne(df.shift()).all(axis=1).cumsum()

Output: Output：

计算唯一行数 pandas

问题描述

2 个解决方案

解决方案1
0 2022-08-29 21:55:38

解决方案2
0 2022-08-30 00:31:33

计算唯一行数 pandas

问题描述

2 个解决方案

解决方案1 0 2022-08-29 21:55:38

解决方案2 0 2022-08-30 00:31:33

解决方案1
0 2022-08-29 21:55:38

解决方案2
0 2022-08-30 00:31:33