简体   繁体   中英

Python: How to iterate through a range of columns in a dataframe, check for specific values and store column name in a list

I am trying to iterate through a range of columns in a data frame and check for specific values in every row. the values should match against my list. If there are matching values in each row with my list, then the column name where first instance where there is a match should append to my new list. How can achieve this? I tried the following for loop but couldn't get it right.

I've looked at a few examples but couldn't find what i was looking for.

iterating through a column in dataframe and creating a list with name of the column + str

How to get the column name for a specific values in every row of a dataframe


import pandas as pd

random = {
        'col1': ['45c','5v','27','k22','wh','u5','36'],
        'col2': ['abc','bca','cab','bac','cab','aab','ccb'],
        'col3': ['xyz','zxy','yxz','zzy','yyx','xyx','zzz'],
        'col4': ['52','75c','k22','d2','3n','4b','cc'],
        'col5': ['tuv','vut','tut','vtu','uvt','uut','vvt'],
        'col6': ['la3','pl','5v','45c','3s','k22','9i']
        }

df = pd.DataFrame(random)

"""
Only 1 value from this list should match with the values in each row of the df
i.e if '45c' is in row 3, then it's a match. place the name of column where '45c' is found in the df in the new list
"""
list = ['45c','5v','d2','3n','k22',]

"""
empty list that should be populated with df column names if there is a single match
"""
rand = []
for row in df.iloc[:,2:5]:
    for x in row:
        if df[x] in list:
            rand.append(df[row][x].columns)
            break

print(rand)

#this is what my df looks like when I print it
  col1 col2 col3 col4 col5 col6
0  45c  abc  xyz   52  tuv  la3
1   5v  bca  zxy  75c  vut   pl
2   27  cab  yxz  k22  tut   5v
3  k22  bac  zzy   d2  vtu  45c
4   wh  cab  yyx   3n  uvt   3s
5   u5  aab  xyx   4b  uut  k22
6   36  ccb  zzz   cc  vvt   9i

the output I was hoping to get is as follows:

rand = ['col1','col4','col1','col6']

First compare all values with DataFrame.isin and get column of first matched value with DataFrame.idxmax , but because if no match it return first column is added condition with DataFrame.any for test it:

L = ['45c','5v','d2','3n','k22']
m = df.isin(L)
out = np.where(m.any(1), m.idxmax(axis=1), 'no match').tolist()
print (out)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6', 'no match']

If need only matched values:

out1 = m.idxmax(axis=1)[m.any(1)].tolist()
print (out1)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']

Detail :

print (m)
    col1   col2   col3   col4   col5   col6
0   True  False  False  False  False  False
1   True  False  False  False  False  False
2  False  False  False   True  False   True
3   True  False  False   True  False   True
4  False  False  False   True  False  False
5  False  False  False  False  False   True
6  False  False  False  False  False  False

Loop solution is possible, but not recommended :

rand = []
for i, row in df.iterrows():
    for x in row:
        if x in L:
            rand.append(i)
print(rand)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM