[英]Python Pandas differing value_counts() in two columns of same len()
I have a pandas data frame that contains two columns, with trace numbers [col_1] and ID numbers [col_2]. 我有一个包含两列的pandas数据框,其中包含跟踪号[col_1]和ID号[col_2]。 Trace numbers can be duplicates, as can ID numbers - however, each trace & ID should correspond only a specific fellow in the adjacent column.
跟踪号可以是重复的,ID号也可以是重复的 - 但是,每个跟踪和ID应该只对应于相邻列中的特定伙伴。
Each of my two columns are the same length, but have different unique value counts, which should be the same, as shown below: 我的两列中的每一列都具有相同的长度,但具有不同的唯一值计数,它们应该相同,如下所示:
in[1]: Trace | ID
1 | 5054
2 | 8291
3 | 9323
4 | 9323
... |
100 | 8928
in[2]: print('unique traces: ', df['Trace'].value_counts())
print('unique IDs: ', df['ID'].value_counts())
out[3]: unique traces: 100
unique IDs: 99
In the code above, the same ID number (9232) is represented by two Trace numbers (3 & 4) - how can I isolate these incidences? 在上面的代码中,相同的ID号(9232)由两个跟踪号(3和4)表示 - 我如何隔离这些事件? Thanks for looking!
谢谢你的期待!
By using the duplicated()
function ( docs ), you can do the following: 通过使用
duplicated()
函数( docs ),您可以执行以下操作:
df[df['ID'].duplicated(keep=False)]
By setting keep
to False
, we get all the duplicates (instead of excluding the first or the last one). 通过将
keep
设置为False
,我们得到所有重复项(而不是排除第一项或最后一项)。
Which returns: 哪个回报:
Trace ID
2 3 9323
3 4 9323
You can use groupby
and filter
: 您可以使用
groupby
和filter
:
df.groupby('ID').filter(lambda x: x.Trace.nunique() > 1)
Output: 输出:
Trace ID
2 3 9323.0
3 4 9323.0
#this should tell you the index of Non-unique Trace or IDs.
df.groupby('ID').filter(lambda x: len(x)>1)
Out[85]:
Trace ID
2 3 9323
3 4 9323
df.groupby('Trace').filter(lambda x: len(x)>1)
Out[86]:
Empty DataFrame
Columns: [Trace, ID]
Index: []
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.