简体   繁体   中英

How to compare two columns value in pandas

I Have a dataframe which has some unique IDs in two of the columns.for eg

S.no. Column1 Column2
    1  00001x  00002x
    2  00003j  00005k
    3  00002x  00001x
    4  00004d  00008e

Value can be anything in the string format I want to compare the two column in such a way that either of s.no 1 or 3 data remains. as these id contains the same information. only its order is different.

Basically if for one row value in a column 1 is X and column 2 is Y and for other row value in column 1 is Y and in Column 2 is x then only one of the row should remain.

is that possible in python?

You can convert your columns asfrozenset per row.

This will give a common order to apply duplicated .

Finally, slice the rows using the previous output as mask:

mask = df.filter(like='Column').apply(frozenset, axis=1).duplicated()
df[~mask]

previous answer using set :

mask = df.filter(like='Column').apply(lambda x: tuple(set(x)), axis=1).duplicated()
df[~mask]

NB. Using a set or sorted requires to convert as tuple ( lambda x: tuple(sorted(x)) ) as the duplicated function hashes the values, which is not possible with mutable objects

output:

   S.no. Column1 Column2
0      1  00001x  00002x
1      2  00003j  00005k
3      4  00004d  00008e

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM