简体   繁体   中英

Find value in Pandas df row and return the column name

There's probably an easy way to do this, but I hit a wall on this one.

I have a dataframe with text as the row data. I'm trying to add new columns to the dataframe based on existing column names.

test_data = {
    'Col1' : ['Boy', 'Boy', 'Boy', 'Boy', 'Boy'],
    'Col2' : ['Girl', 'Girl', 'Girl', 'Baseball', 'Girl'],
    'Col3' : ['Baseball', 'Baseball', 'Baseball', 'Lizard', 'Baseball'],
    'Col4' : ['Lizard', 'Hockey', 'Lizard', 'Girl', 'Hockey']
}

df = pd.DataFrame(test_data, columns = ['Col1', 'Col2', 'Col3', 'Col4'])
print(df)

  Col1      Col2      Col3    Col4
0  Boy      Girl  Baseball  Lizard
1  Boy      Girl  Baseball  Hockey
2  Boy      Girl  Baseball  Lizard
3  Boy  Baseball    Lizard    Girl
4  Boy      Girl  Baseball  Hockey

# Create new columns with locations
for col in ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey']:
    df[col] = 99
    
print(df)

  Col1      Col2      Col3    Col4  Boy  Girl  Lizard  Baseball  Hockey
0  Boy      Girl  Baseball  Lizard   99    99      99        99      99
1  Boy      Girl  Baseball  Hockey   99    99      99        99      99
2  Boy      Girl  Baseball  Lizard   99    99      99        99      99
3  Boy  Baseball    Lizard    Girl   99    99      99        99      99
4  Boy      Girl  Baseball  Hockey   99    99      99        99      99

What I'd like it to do is the below. If it matters, a string can only appear once per row, and can also not show up. I found a method using argsort, but that doesn't help with strings. Thanks very much.

answers = {
    'Boy' : ['Col1', 'Col1', 'Col1', 'Col1', 'Col1'],
    'Girl' : ['Col2', 'Col2', 'Col2', 'Col4', 'Col2'],
    'Lizard' : ['Col4', 0, 'Col4', 'Col3', 0],
    'Baseball' : ['Col3', 'Col3', 'Col3', 'Col2', 'Col3'],
    'Hockey' : [ 0, 'Col4', 0, 0, 'Col4']
}
df_answers = pd.DataFrame(answers, columns = ['Boy', 'Girl', 'Lizard', 'Baseball', 'Hockey'])
print(df_answers)

    Boy  Girl Lizard Baseball Hockey
0  Col1  Col2   Col4     Col3      0
1  Col1  Col2      0     Col3   Col4
2  Col1  Col2   Col4     Col3      0
3  Col1  Col4   Col3     Col2      0
4  Col1  Col2      0     Col3   Col4

Let's do:

s = df.stack().reset_index(name='var')
s.pivot('level_0', 'var', 'level_1').rename_axis(index=None, columns=None)

Details:

.stack the dataframe and reset_index :

    level_0 level_1       var
0         0    Col1       Boy
1         0    Col2      Girl
2         0    Col3  Baseball
3         0    Col4    Lizard
4         1    Col1       Boy
5         1    Col2      Girl
6         1    Col3  Baseball
7         1    Col4    Hockey
8         2    Col1       Boy
9         2    Col2      Girl
10        2    Col3  Baseball
11        2    Col4    Lizard
12        3    Col1       Boy
13        3    Col2  Baseball
14        3    Col3    Lizard
15        3    Col4      Girl
16        4    Col1       Boy
17        4    Col2      Girl
18        4    Col3  Baseball
19        4    Col4    Hockey

.pivot the above stacked frame to reshape it into a new dataframe having it's index as level_0 , columns as var and values as level_1 :

  Baseball   Boy  Girl Hockey Lizard
0     Col3  Col1  Col2    NaN   Col4
1     Col3  Col1  Col2   Col4    NaN
2     Col3  Col1  Col2    NaN   Col4
3     Col2  Col1  Col4    NaN   Col3
4     Col3  Col1  Col2   Col4    NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM