如何比较熊猫中的两列值

Question

I Have a dataframe which has some unique IDs in two of the columns.for eg我有一个数据框，其中两列中有一些唯一的 ID。例如

S.no. Column1 Column2
    1  00001x  00002x
    2  00003j  00005k
    3  00002x  00001x
    4  00004d  00008e

Value can be anything in the string format I want to compare the two column in such a way that either of s.no 1 or 3 data remains.值可以是字符串格式的任何内容，我想以 s.no 1 或 3 数据保留的方式比较两列。 as these id contains the same information.因为这些 id 包含相同的信息。 only its order is different.只是它的顺序不同。

Basically if for one row value in a column 1 is X and column 2 is Y and for other row value in column 1 is Y and in Column 2 is x then only one of the row should remain.基本上，如果第 1 列中的一个行值为 X，第 2 列为 Y，第 1 列中的其他行值为 Y，第 2 列中为 x，则应仅保留其中一行。

is that possible in python?这在python中可能吗？

Answer 1

You can convert your columns asfrozenset per row.您可以将列转换为每行的frozenset 。

This will give a common order to apply duplicated .这将给出一个应用duplicated的通用顺序。

Finally, slice the rows using the previous output as mask:最后，使用先前的输出作为掩码对行进行切片：

mask = df.filter(like='Column').apply(frozenset, axis=1).duplicated()
df[~mask]

previous answer using set :使用set的先前答案：

mask = df.filter(like='Column').apply(lambda x: tuple(set(x)), axis=1).duplicated()
df[~mask]

NB.注意。 Using a set or sorted requires to convert as tuple ( lambda x: tuple(sorted(x)) ) as the duplicated function hashes the values, which is not possible with mutable objects使用 set 或 sorted 需要转换为元组（ lambda x: tuple(sorted(x)) ），因为duplicated的函数会散列值，这对于可变对象是不可能的

output:输出：

   S.no. Column1 Column2
0      1  00001x  00002x
1      2  00003j  00005k
3      4  00004d  00008e

如何比较熊猫中的两列值

问题描述

1 个解决方案

解决方案1
5 已采纳 2021-09-15 08:38:27

如何比较熊猫中的两列值

问题描述

1 个解决方案

解决方案1 5 已采纳 2021-09-15 08:38:27

解决方案1
5 已采纳 2021-09-15 08:38:27