searching similar columns names in multiple dataframe

Question

I have multiple datasets which has same columns name as below example, I want the columns which are repeated in multiple datasets sort out in list format using python and pandas.

df1 = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
               'B': 'one one two three two two one three'.split(),
               'C': np.arange(8), 
               'D': np.arange(8) * 2})
df2 = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
               'B': 'one one two three two two one three'.split(),
               'C': np.arange(8)})
df3 = pd.DataFrame({'A': 'foo bar foo bar foo bar foo foo'.split(),
               'B': 'one one two three two two one three'.split(),
               'D': np.arange(8) * 2})

As from above we can see in three Datasets df1, df2, df3 has repeated columns as 'A', 'B' and the output as ['A', 'B'] Please can give solution to this problem. Thanks in Advance

Answer 1

Pandas columns are of type pandas.core.indexes.base.Index you could use the intersection function in them to find the overlapping elements. Here is an example below

import pandas as pd
import numpy as np

a = np.arange(1,4)
b = np.arange(5,8)
c = np.random.randint(0,10,size=3)
d = np.random.randint(0,10,size=3)
df_1 = pd.DataFrame({'a':a,'b':b,'c':c,'d':d})

out:

    a   b   c   d
0   1   5   5   1
1   2   6   7   5
2   3   7   6   9

a = np.arange(4,7)
b = np.arange(7,10)
e = np.random.randint(0,10,size=3)
f = np.random.randint(0,10,size=3)
df_2 = pd.DataFrame({'a':a,'b':b,'e':c,'f':d})
df_2

out:

    a   b   e   f
0   4   7   9   9
1   5   8   9   3
2   6   9   2   1

df_1.columns.intersection(df_2.columns)

out:

Index(['a', 'b'], dtype='object')

type(df_1.columns)

out:

pandas.core.indexes.base.Index

Answer 2

Pandas can get list of column names for you. For example, df1.columns will return ['A','B','C','D'] . Likewise you can get the list of column names for each dataframe.

Then you can just find out the intersection of all these lists .

Answer 3

I think simpliest is & for intersection of all columns names:

a = df1.columns & df2.columns & df3.columns
print (a)
Index(['A', 'B'], dtype='object')

If need list :

a = (df1.columns & df2.columns & df3.columns).tolist()
print (a)
['A', 'B']

searching similar columns names in multiple dataframe

Question

3 answers

solution1
0 2018-03-07 08:36:51

solution2
0 ACCPTED 2018-03-07 08:37:18

solution3
0 2018-03-07 08:45:14

searching similar columns names in multiple dataframe

Question

3 answers

solution1 0 2018-03-07 08:36:51

solution2 0 ACCPTED 2018-03-07 08:37:18

solution3 0 2018-03-07 08:45:14

solution1
0 2018-03-07 08:36:51

solution2
0 ACCPTED 2018-03-07 08:37:18

solution3
0 2018-03-07 08:45:14