简体   繁体   中英

Generating a dictionary of column names based on a condition among columns of a dataframe

I have the following data frame :

                        a_11        b_14    c_13     d_12
AC                      True        False   False   False 
BA                      True        False   False   True
AA                      False       False   False   False 

I want a dictionary with key as the index and the values as the list of column names which have true values ie

{
AC : [a_11],
BA : [a_11,d_12],
AA : []
}

How am I supposed to proceed with this problem

edit : the column names are string, not a character.

Use dictioanry comprehension if performance is important with transpose DataFrame and convert columns names to list:

d = {k: v.index[v].tolist() for k, v in df.T.items()}
print (d)
{'AC': ['a_11'], 'BA': ['a_11', 'd_12'], 'AA': []}

Another idea with zip and convert values to 2d numpy array by DataFrame.to_numpy :

d = {k: df.columns[v].tolist() for k, v in zip(df.index, df.to_numpy())}
print (d)
{'AC': ['a_11'], 'BA': ['a_11', 'd_12'], 'AA': []}

You can use df.mul here to multiply df with df.columns then use df.agg to filter out empty strings ''

out = df.mul(df.columns).agg(lambda x:[*filter(None, x)], axis=1)

AC          [a_11]
BA    [a_11, d_12]
AA              []
dtype: object

You can use list comprehension here.

vals = [df.columns[m].tolist() for m in df.values]
# vals -> [['a_11'], ['a_11', 'd_12'], []]
pd.Series(vals, index=df.index)

AC          [a_11]
BA    [a_11, d_12]
AA              []
dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM