Get column names of a data frame based on values from a list in pandas python

Question

I have a dataframe df as follows (only 1 row):

col1    col2    col3    col4    col5
a1       b1     c_d      d1      e10

I have another list val = [a1, c_d, e10]. I want to get the column names for the values present in val. In this case the column names will be in a list, colnm = [col1,col3,col5]. I did the same in R using:

names(df)[which((df %in% val) == TRUE)]

But not able to figure out in python as I am new in Python. Any help will be appreciated. TIA.

Answer 1

General soluion for multiple rows - tested if at least one value or if all values per columns has values from val .

You can test membership by DataFrame.isin and then test by DataFrame.any or DataFrame.all :

#added new row for see difference
print (df)
  col1 col2 col3 col4 col5
0   a1   b1  c_d   d1  e10
1   a1   d1  c_e   f1  e10

val = ['a1', 'c_d', 'e10']

#tested membership
print (df.isin(val))
   col1   col2   col3   col4  col5
0  True  False   True  False  True
1  True  False  False  False  True

#test if at least one True per column
print (df.isin(val).any())
col1     True
col2    False
col3     True
col4    False
col5     True
dtype: bool

#test if all Trues per column
print (df.isin(val).all())
col1     True
col2    False
col3    False
col4    False
col5     True
dtype: bool

names = df.columns[df.isin(val).any()]
print (names)
Index(['col1', 'col3', 'col5'], dtype='object')

names = df.columns[df.isin(val).all()]
print (names)
Index(['col1', 'col5'], dtype='object')

If DataFrame has only one row is possible seelct first row for Series by DataFrame.iloc and then test membership by Series.isin :

names = df.columns[df.iloc[0].isin(val)]

EDIT: If not help upgdare to last version of pandas here is one solution for repalce all object columns with no strings to missing values:

data = [
    {'id': 1, 'content': [{'values': 3}]},
    {'id': 2, 'content': 'a1'},
    {'id': 3, 'content': 'c_d'},
    {'id': 4, 'content': np.array([4,5])}
]

df = pd.DataFrame(data)

mask1 = ~df.columns.isin(df.select_dtypes(object).columns)
mask2 = df.applymap(lambda x: isinstance(x, str))

df = df.where(mask2 | mask1)
print (df)
   id content
0   1     NaN
1   2      a1
2   3     c_d
3   4     NaN

val = ['a1', 'c_d', 'e10']
print (df.isin(val))
      id  content
0  False    False
1  False     True
2  False     True
3  False    False

Get column names of a data frame based on values from a list in pandas python

Question

1 answers

solution1
2 ACCPTED 2020-05-19 05:35:49

Get column names of a data frame based on values from a list in pandas python

Question

1 answers

solution1 2 ACCPTED 2020-05-19 05:35:49

solution1
2 ACCPTED 2020-05-19 05:35:49