[英]Python: How to iterate through a range of columns in a dataframe, check for specific values and store column name in a list
I am trying to iterate through a range of columns in a data frame and check for specific values in every row. 我试图遍历数据框中的一系列列,并检查每一行中的特定值。 the values should match against my list.
这些值应与我的列表匹配。 If there are matching values in each row with my list, then the column name where first instance where there is a match should append to my new list.
如果列表中的每一行都有匹配的值,则应将匹配项的第一个实例的列名称附加到我的新列表中。 How can achieve this?
如何做到这一点? I tried the following for loop but couldn't get it right.
我尝试了以下for循环,但无法正确执行。
I've looked at a few examples but couldn't find what i was looking for. 我看了几个例子,但找不到我想要的东西。
iterating through a column in dataframe and creating a list with name of the column + str 遍历数据框中的一列并创建一个带有列名+ str的列表
How to get the column name for a specific values in every row of a dataframe 如何获取数据框每一行中特定值的列名
import pandas as pd
random = {
'col1': ['45c','5v','27','k22','wh','u5','36'],
'col2': ['abc','bca','cab','bac','cab','aab','ccb'],
'col3': ['xyz','zxy','yxz','zzy','yyx','xyx','zzz'],
'col4': ['52','75c','k22','d2','3n','4b','cc'],
'col5': ['tuv','vut','tut','vtu','uvt','uut','vvt'],
'col6': ['la3','pl','5v','45c','3s','k22','9i']
}
df = pd.DataFrame(random)
"""
Only 1 value from this list should match with the values in each row of the df
i.e if '45c' is in row 3, then it's a match. place the name of column where '45c' is found in the df in the new list
"""
list = ['45c','5v','d2','3n','k22',]
"""
empty list that should be populated with df column names if there is a single match
"""
rand = []
for row in df.iloc[:,2:5]:
for x in row:
if df[x] in list:
rand.append(df[row][x].columns)
break
print(rand)
#this is what my df looks like when I print it
col1 col2 col3 col4 col5 col6
0 45c abc xyz 52 tuv la3
1 5v bca zxy 75c vut pl
2 27 cab yxz k22 tut 5v
3 k22 bac zzy d2 vtu 45c
4 wh cab yyx 3n uvt 3s
5 u5 aab xyx 4b uut k22
6 36 ccb zzz cc vvt 9i
the output I was hoping to get is as follows: 我希望得到的输出如下:
rand = ['col1','col4','col1','col6']
First compare all values with DataFrame.isin
and get column of first matched value with DataFrame.idxmax
, but because if no match it return first column is added condition with DataFrame.any
for test it: 先用比较所有值
DataFrame.isin
并获得与第一个匹配值的列DataFrame.idxmax
,而是因为如果没有匹配它返回第一列添加条件与DataFrame.any
的测试:
L = ['45c','5v','d2','3n','k22']
m = df.isin(L)
out = np.where(m.any(1), m.idxmax(axis=1), 'no match').tolist()
print (out)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6', 'no match']
If need only matched values: 如果仅需要匹配的值:
out1 = m.idxmax(axis=1)[m.any(1)].tolist()
print (out1)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']
Detail : 详细说明 :
print (m)
col1 col2 col3 col4 col5 col6
0 True False False False False False
1 True False False False False False
2 False False False True False True
3 True False False True False True
4 False False False True False False
5 False False False False False True
6 False False False False False False
Loop solution is possible, but not recommended : 可以使用循环解决方案,但不建议这样做 :
rand = []
for i, row in df.iterrows():
for x in row:
if x in L:
rand.append(i)
print(rand)
['col1', 'col1', 'col4', 'col1', 'col4', 'col6']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.