Python Pandas在同一个len（）的两列中有不同的value_counts（）

Question

I have a pandas data frame that contains two columns, with trace numbers [col_1] and ID numbers [col_2]. 我有一个包含两列的pandas数据框，其中包含跟踪号[col_1]和ID号[col_2]。 Trace numbers can be duplicates, as can ID numbers - however, each trace & ID should correspond only a specific fellow in the adjacent column. 跟踪号可以是重复的，ID号也可以是重复的 - 但是，每个跟踪和ID应该只对应于相邻列中的特定伙伴。

Each of my two columns are the same length, but have different unique value counts, which should be the same, as shown below: 我的两列中的每一列都具有相同的长度，但具有不同的唯一值计数，它们应该相同，如下所示：

in[1]:  Trace | ID
        1     | 5054
        2     | 8291
        3     | 9323
        4     | 9323
        ...   |
        100   | 8928

in[2]:  print('unique traces: ', df['Trace'].value_counts())
        print('unique IDs: ', df['ID'].value_counts())

out[3]: unique traces: 100
        unique IDs: 99

In the code above, the same ID number (9232) is represented by two Trace numbers (3 & 4) - how can I isolate these incidences? 在上面的代码中，相同的ID号（9232）由两个跟踪号（3和4）表示 - 我如何隔离这些事件？ Thanks for looking! 谢谢你的期待！

Answer 1

By using the duplicated() function ( docs ), you can do the following: 通过使用duplicated()函数（ docs ），您可以执行以下操作：

df[df['ID'].duplicated(keep=False)]

By setting keep to False , we get all the duplicates (instead of excluding the first or the last one). 通过将keep设置为False ，我们得到所有重复项（而不是排除第一项或最后一项）。

Which returns: 哪个回报：

Trace   ID
2   3   9323
3   4   9323

Answer 2

You can use groupby and filter : 您可以使用groupby和filter ：

df.groupby('ID').filter(lambda x: x.Trace.nunique() > 1)

Output: 输出：

  Trace      ID
2     3  9323.0
3     4  9323.0

Answer 3

#this should tell you the index of Non-unique Trace or IDs.

df.groupby('ID').filter(lambda x: len(x)>1)
Out[85]: 
   Trace    ID
2      3  9323
3      4  9323

df.groupby('Trace').filter(lambda x: len(x)>1)
Out[86]: 
Empty DataFrame
Columns: [Trace, ID]
Index: []

Python Pandas在同一个len（）的两列中有不同的value_counts（）

问题描述

3 个解决方案

解决方案1
2 已采纳 2017-05-18 19:30:38

解决方案2
1 2017-05-18 19:13:06

解决方案3
0 2017-05-18 19:16:17

Python Pandas在同一个len（）的两列中有不同的value_counts（）

问题描述

3 个解决方案

解决方案1 2 已采纳 2017-05-18 19:30:38

解决方案2 1 2017-05-18 19:13:06

解决方案3 0 2017-05-18 19:16:17

解决方案1
2 已采纳 2017-05-18 19:30:38

解决方案2
1 2017-05-18 19:13:06

解决方案3
0 2017-05-18 19:16:17