简体   繁体   中英

Iterate over all columns in pandas dataframe

I have a Pandas dataframe with 100 columns. I want to perform an operation that compares all of the possible column combinations to each other (col 1 vs col2, col 1 vs. col3, [...] col 99 vs. col 100).

For example:

colA   colB   colC   colD
   1      1      2      1

so for example a comparison of equal value between two values should yield yes for colA vs. colB and no for colA vs. colC.

Ideally, I would like to only make unique comparisons so colA vs. colB is equal to colB vs. colA and only one value should be retained.

Is there any efficient way to do it?

The 1st thing I would do is set the comparison command for example

(df['col1'] == df['col2']).any()

what we need is the combinations of all columns

from itertools import combinations
combs = list(combinations(df.columns, 2))

now we can loop through them and compare them, using our single row from the top

for cmb in combs:
    print((df[cmb[0]] == df[cmb[1]]).any())
import itertools
from scipy.spatial.distance import pdist
pd.Series(pdist(df.T)==0, index=itertools.combinations(df.columns, 2))

output:

(colA, colB)     True
(colA, colC)    False
(colA, colD)     True
(colB, colC)    False
(colB, colD)     True
(colC, colD)    False

alternative as matrix:

import itertools
from scipy.spatial.distance import pdist, squareform
pd.DataFrame(squareform(pdist(df.T)) == 0, index=df.columns, columns=df.columns)

output:

       colA   colB   colC   colD
colA   True   True  False   True
colB   True   True  False   True
colC  False  False   True  False
colD   True   True  False   True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM