Iterate over all columns in pandas dataframe

Question

I have a Pandas dataframe with 100 columns. I want to perform an operation that compares all of the possible column combinations to each other (col 1 vs col2, col 1 vs. col3, [...] col 99 vs. col 100).

For example:

colA   colB   colC   colD
   1      1      2      1

so for example a comparison of equal value between two values should yield yes for colA vs. colB and no for colA vs. colC.

Ideally, I would like to only make unique comparisons so colA vs. colB is equal to colB vs. colA and only one value should be retained.

Is there any efficient way to do it?

Answer 1

The 1st thing I would do is set the comparison command for example

(df['col1'] == df['col2']).any()

what we need is the combinations of all columns

from itertools import combinations
combs = list(combinations(df.columns, 2))

now we can loop through them and compare them, using our single row from the top

for cmb in combs:
    print((df[cmb[0]] == df[cmb[1]]).any())

Answer 2

import itertools
from scipy.spatial.distance import pdist
pd.Series(pdist(df.T)==0, index=itertools.combinations(df.columns, 2))

output:

(colA, colB)     True
(colA, colC)    False
(colA, colD)     True
(colB, colC)    False
(colB, colD)     True
(colC, colD)    False

alternative as matrix:

import itertools
from scipy.spatial.distance import pdist, squareform
pd.DataFrame(squareform(pdist(df.T)) == 0, index=df.columns, columns=df.columns)

output:

       colA   colB   colC   colD
colA   True   True  False   True
colB   True   True  False   True
colC  False  False   True  False
colD   True   True  False   True

Iterate over all columns in pandas dataframe

Question

2 answers

solution1
1 2021-07-12 18:00:48

solution2
1 ACCPTED 2021-07-12 18:15:45

Iterate over all columns in pandas dataframe

Question

2 answers

solution1 1 2021-07-12 18:00:48

solution2 1 ACCPTED 2021-07-12 18:15:45

solution1
1 2021-07-12 18:00:48

solution2
1 ACCPTED 2021-07-12 18:15:45